From stuartb at 4gh.net  Fri Oct  3 18:19:08 2014
From: stuartb at 4gh.net (Stuart Barkley)
Date: Fri, 3 Oct 2014 13:19:08 -0400 (EDT)
Subject: [gpfsug-discuss]  filesets and mountpoint naming
Message-ID: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>

Resent: First copy sent Sept 23.  Maybe stuck in a moderation queue?

When we first started using GPFS we created several filesystems and
just directly mounted them where seemed appropriate.  We have
something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome
and are looking at using filesets inside fewer filesystems to
accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user
experience, but I'm looking for suggestions as to where to mount the
actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one
cluster has a traditional gpfs system and a separate gss system which
will obviously need multiple mount points.  We also want to consider
possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you
do differently?

Thanks,
Stuart
-- 
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone


From bbanister at jumptrading.com  Mon Oct  6 16:17:44 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Mon, 6 Oct 2014 15:17:44 +0000
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>
References: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com>

There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang.

Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment.  Hope that helps,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
Sent: Friday, October 03, 2014 12:19 PM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] filesets and mountpoint naming

Resent: First copy sent Sept 23.  Maybe stuck in a moderation queue?

When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you do differently?

Thanks,
Stuart
--
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


From bbanister at jumptrading.com  Mon Oct  6 16:36:17 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Mon, 6 Oct 2014 15:36:17 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>

Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch -j FilesetName command (which specifies
a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in
parallel on a single fileset).

We were able to quickly create the "--home-inode-file" from the old file system using the mmapplypolicy command as the documentation describes.  However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.

Cheers,
-Bryan


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141006/24cbed89/attachment.htm>

From Sandra.McLaughlin at astrazeneca.com  Mon Oct  6 16:40:45 2014
From: Sandra.McLaughlin at astrazeneca.com (McLaughlin, Sandra M)
Date: Mon, 6 Oct 2014 15:40:45 +0000
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
References: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
Message-ID: <5ed81d7bfbc94873aa804cfc807d5858@DBXPR04MB031.eurprd04.prod.outlook.com>

Hi Stuart,

We have a very similar setup. I use /gpfs01, /gpfs02 etc. and then use filesets within those, and symbolic links on the gpfs cluster members to give the same user experience combined with automounter maps (we have a large number of NFS clients as well as cluster members).  This all works quite well.

Regards, Sandra


--------------------------------------------------------------------------
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number: 03674842 and a registered office at 2 Kingdom Street, London, W2 6BD.
Confidentiality Notice: This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful.
Disclaimer: Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by AstraZeneca UK Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email.
Monitoring: AstraZeneca UK Limited may monitor email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking Compliance with our Code of Conduct and Policies.
-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
Sent: 23 September 2014 16:47
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] filesets and mountpoint naming

When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you do differently?

Thanks,
Stuart
--
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From zgiles at gmail.com  Mon Oct  6 16:42:56 2014
From: zgiles at gmail.com (Zachary Giles)
Date: Mon, 6 Oct 2014 11:42:56 -0400
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CAMYZk=cdXgXpFE7pkYxk8VRa7_Ani0hsrz68Y_HjE3GR+4xsyQ@mail.gmail.com>

Here we have just one large GPFS file system with many file sets
inside. We mount it under /sc/something (sc for scientific computing).
We user the /sc/ as we previously had another GPFS file system while
migrating from one to the other. It's pretty easy and straight forward
to have just one file system.. eases administration and mounting.
You can make symlinks.. like /scratch -> /sc/something/scratch/ if you
want. We did that, and it's how most of our users got to the system
for a long time. We even remounted the GPFS file system from where DDN
left it at install time ( /gs01 ) to /sc/gs01, updated the symlink,
and the users never knew.

Multicluster for compute nodes separate from the FS cluster.

YMMV depending on if you want to allow everyone to mount your file
system or not. I know some people don't. We only admin our own boxes
and no one else does, so it works best this way for us given the ideal
scenario.


On Mon, Oct 6, 2014 at 11:17 AM, Bryan Banister
<bbanister at jumptrading.com> wrote:
> There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang.
>
> Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment.  Hope that helps,
> -Bryan
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
> Sent: Friday, October 03, 2014 12:19 PM
> To: gpfsug-discuss at gpfsug.org
> Subject: [gpfsug-discuss] filesets and mountpoint naming
>
> Resent: First copy sent Sept 23.  Maybe stuck in a moderation queue?
>
> When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:
>
>     /home
>     /scratch
>     /projects
>     /reference
>     /applications
>
> We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).
>
> We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.
>
> We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.
>
> Some thoughts are to just do filesystems as:
>
>     /gpfs01, /gpfs02, etc.
>     /mnt/gpfs01, etc
>     /mnt/clustera/gpfs01, etc.
>
> What have other people done?  Are you happy with it?  What would you do differently?
>
> Thanks,
> Stuart
> --
> I've never been lost; I was once bewildered for three days, but never lost!
>                                         --  Daniel Boone _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Zach Giles
zgiles at gmail.com


From oehmes at gmail.com  Mon Oct  6 17:27:58 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Mon, 6 Oct 2014 09:27:58 -0700
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:

>  Just an FYI to the GPFS user community,
>
>
>
> We have been testing out GPFS AFM file systems in our required process of
> file data migration between two GPFS file systems.  The two GPFS file
> systems are managed in two separate GPFS clusters.  We have a third GPFS
> cluster for compute systems.  We created new independent AFM filesets in
> the new GPFS file system that are linked to directories in the old file
> system.  Unfortunately access to the AFM filesets from the compute cluster
> completely hang.  Access to the other parts of the second file system is
> fine.  This limitation/issue is not documented in the Advanced Admin Guide.
>
>
>
> Further, we performed prefetch operations using a file mmafmctl command,
> but the process appears to be single threaded and the operation was
> extremely slow as a result.  According to the Advanced Admin Guide, it is
> not possible to run multiple prefetch jobs on the same fileset:
>
> GPFS can prefetch the data using the *mmafmctl **Device **prefetch ?j **FilesetName
> *command (which specifies
>
> a list of files to prefetch). Note the following about prefetching:
>
> v It can be run in parallel on multiple filesets (although more than one
> prefetching job cannot be run in
>
> parallel on a single fileset).
>
>
>
> We were able to quickly create the ?--home-inode-file? from the old file
> system using the mmapplypolicy command as the documentation describes.
> However the AFM prefetch operation is so slow that we are better off
> running parallel rsync operations between the file systems versus using the
> GPFS AFM prefetch operation.
>
>
>
> Cheers,
>
> -Bryan
>
>
>
> ------------------------------
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141006/37d054b7/attachment.htm>

From bbanister at jumptrading.com  Mon Oct  6 17:30:02 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Mon, 6 Oct 2014 16:30:02 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>

We are using 4.1.0.3 on the cluster with the AFM filesets,
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies
a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in
parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes.  However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.

Cheers,
-Bryan


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141006/220cf9e0/attachment.htm>

From kgunda at in.ibm.com  Tue Oct  7 06:03:07 2014
From: kgunda at in.ibm.com (Kalyan Gunda)
Date: Tue, 7 Oct 2014 10:33:07 +0530
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>

Hi Bryan,
 AFM supports GPFS multi-cluster..and we have customers already using this
successfully.  Are you using GPFS backend?
Can you explain your configuration in detail and if ls is hung it would
have generated some long waiters.  Maybe this should be pursued separately
via PMR.  You can ping me the details directly if needed along with opening
a PMR per IBM service process.

As for as prefetch is concerned, right now its limited to  one prefetch job
per fileset.  Each job in itself is multi-threaded and can use multi-nodes
to pull in data based on configuration.
"afmNumFlushThreads" tunable controls the number of threads used by AFM.
This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't
show this param for some reason, I will have that updated.)

eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5
Fileset prefetchIW changed.

List the change:
 mmlsfileset fs1 prefetchIW --afm -L
Filesets in file system 'fs1':

Attributes for fileset prefetchIW:
===================================
Status                                  Linked
Path                                    /gpfs/fs1/prefetchIW
Id                                      36
afm-associated                          Yes
Target
nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch
Mode                                    independent-writer
File Lookup Refresh Interval            30 (default)
File Open Refresh Interval              30 (default)
Dir Lookup Refresh Interval             60 (default)
Dir Open Refresh Interval               60 (default)
Async Delay                             15 (default)
Last pSnapId                            0
Display Home Snapshots                  no
Number of Gateway Flush Threads         5
Prefetch Threshold                      0 (default)
Eviction Enabled                        yes (default)

AFM parallel i/o can be setup such that multiple GW nodes can be used to
pull in data..more details are available here
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm

and this link outlines tuning params for parallel i/o along with others:
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning

Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:	Bryan Banister <bbanister at jumptrading.com>
To:	gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:	10/06/2014 09:57 PM
Subject:	Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:	gpfsug-discuss-bounces at gpfsug.org


We are using 4.1.0.3 on the cluster with the AFM filesets,
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster
environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of
file data migration between two GPFS file systems.  The two GPFS file
systems are managed in two separate GPFS clusters.  We have a third GPFS
cluster for compute systems.  We created new independent AFM filesets in
the new GPFS file system that are linked to directories in the old file
system.  Unfortunately access to the AFM filesets from the compute cluster
completely hang.  Access to the other parts of the second file system is
fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command,
but the process appears to be single threaded and the operation was
extremely slow as a result.  According to the Advanced Admin Guide, it is
not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j
FilesetName command (which specifies
a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one
prefetching job cannot be run in
parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file
system using the mmapplypolicy command as the documentation describes.
However the AFM prefetch operation is so slow that we are better off
running parallel rsync operations between the file systems versus using the
GPFS AFM prefetch operation.

Cheers,
-Bryan


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From bbanister at jumptrading.com  Tue Oct  7 15:44:48 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Tue, 7 Oct 2014 14:44:48 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>

Interesting that AFM is supposed to work in a multi-cluster environment.  We were using GPFS on the backend.  The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount.   The "gateway" nodes in the new cluster mounted the old file system.  All systems were connected over the same QDR IB fabric.  The client compute nodes in the third cluster mounted both the old and new file systems.  I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed.  I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable.  There were no error messages in any GPFS or system logs related to the `ls` fail.  NFS clients accessing cNFS servers in the new cluster also worked as expected.  The `ls` from the NFS client in an AFM fileset returned the expected directory listing.  Thus all symptoms indicated the configuration wasn't supported.  I may try to replicate the problem in a test environment at some point.

However AFM isn't really a great solution for file data migration between file systems for these reasons:
1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data).  No way to have it simply keep the two namespaces in sync.  And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system.  This is concerning and it is not easy to detect that this event has occurred.

2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME!  And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES!

3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration.  Multiple threads from a single node just doesn't cut it due to single node bandwidth limits.  When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second.  The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list!

4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration.  Yes, there is of course efforts to breakup the namespace for each rsync operations.  This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired.

AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN.

Thanks,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda
Sent: Tuesday, October 07, 2014 12:03 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations

Hi Bryan,
 AFM supports GPFS multi-cluster..and we have customers already using this successfully.  Are you using GPFS backend?
Can you explain your configuration in detail and if ls is hung it would have generated some long waiters.  Maybe this should be pursued separately via PMR.  You can ping me the details directly if needed along with opening a PMR per IBM service process.

As for as prefetch is concerned, right now its limited to  one prefetch job per fileset.  Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration.
"afmNumFlushThreads" tunable controls the number of threads used by AFM.
This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.)

eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed.

List the change:
 mmlsfileset fs1 prefetchIW --afm -L
Filesets in file system 'fs1':

Attributes for fileset prefetchIW:
===================================
Status                                  Linked
Path                                    /gpfs/fs1/prefetchIW
Id                                      36
afm-associated                          Yes
Target
nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch
Mode                                    independent-writer
File Lookup Refresh Interval            30 (default)
File Open Refresh Interval              30 (default)
Dir Lookup Refresh Interval             60 (default)
Dir Open Refresh Interval               60 (default)
Async Delay                             15 (default)
Last pSnapId                            0
Display Home Snapshots                  no
Number of Gateway Flush Threads         5
Prefetch Threshold                      0 (default)
Eviction Enabled                        yes (default)

AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm

and this link outlines tuning params for parallel i/o along with others:
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning

Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:   Bryan Banister <bbanister at jumptrading.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/06/2014 09:57 PM
Subject:        Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:        gpfsug-discuss-bounces at gpfsug.org


We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan

From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes.
However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.

Cheers,
-Bryan


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

From kgunda at in.ibm.com  Tue Oct  7 16:20:30 2014
From: kgunda at in.ibm.com (Kalyan Gunda)
Date: Tue, 7 Oct 2014 20:50:30 +0530
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <OF63B850E2.11E81404-ON65257D6A.005322A1-65257D6A.00544690@in.ibm.com>

some clarifications inline:

Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:	Bryan Banister <bbanister at jumptrading.com>
To:	gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:	10/07/2014 08:12 PM
Subject:	Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:	gpfsug-discuss-bounces at gpfsug.org


Interesting that AFM is supposed to work in a multi-cluster environment.
We were using GPFS on the backend.  The new GPFS file system was AFM linked
over GPFS protocol to the old GPFS file system using the standard
multi-cluster mount.   The "gateway" nodes in the new cluster mounted the
old file system.  All systems were connected over the same QDR IB fabric.
The client compute nodes in the third cluster mounted both the old and new
file systems.  I looked for waiters on the client and NSD servers of the
new file system when the problem occurred, but none existed.  I tried
stracing the `ls` process, but it reported nothing and the strace itself
become unkillable.  There were no error messages in any GPFS or system logs
related to the `ls` fail.  NFS clients accessing cNFS servers in the new
cluster also worked as expected.  The `ls` from the NFS client in an AFM
fileset returned the expected directory listing.  Thus all symptoms
indicated the configuration wasn't supported.  I may try to replicate the
problem in a test environment at some point.

However AFM isn't really a great solution for file data migration between
file systems for these reasons:
1) It requires the complicated AFM setup, which requires manual operations
to sync data between the file systems (e.g. mmapplypolicy run on old file
system to get file list THEN mmafmctl prefetch operation on the new AFM
fileset to pull data).  No way to have it simply keep the two namespaces in
sync.  And you must be careful with the "Local Update" configuration not to
modify basically ANY file attributes in the new AFM fileset until a CLEAN
cutover of your application is performed, otherwise AFM will remove the
link of the file to data stored on the old file system.  This is concerning
and it is not easy to detect that this event has occurred.

--> The LU mode is meant for scenarios where changes in cache are not meant
to be pushed back to old filesystem.  If thats not whats desired then other
AFM modes like IW can be used to keep namespace in sync and data can flow
from both sides.  Typically, for data migration --metadata-only to pull in
the full namespace first and data can be migrated on demand or via policy
as outlined above using prefetch cmd.  AFM setup should be extension to
GPFS multi-cluster setup when using GPFS backend.

2) The "Progressive migration with no downtime" directions actually states
that there is downtime required to move applications to the new cluster,
THUS DOWNTIME!  And it really requires a SECOND downtime to finally disable
AFM on the file set so that there is no longer a connection to the old file
system, THUS TWO DOWNTIMES!
--> I am not sure I follow the first downtime.  If applications have to
start using the new filesystem, then they have to be informed accordingly.
If this can be done without bringing down applications, then there is no
DOWNTIME.
Regarding, second downtime, you are right, disabling AFM after data
migration requires unlink and hence downtime.  But there is a easy
workaround, where revalidation intervals can be increased to max or GW
nodes can be unconfigured without downtime with same effect.  And disabling
AFM can be done at a later point during maintenance window.  We plan to
modify this to have this done online aka without requiring unlink of the
fileset.  This will get prioritized if there is enough interest in AFM
being used in this direction.

3) The prefetch operation can only run on a single node thus is not able to
take any advantage of the large number of NSD servers supporting both file
systems for the data migration.  Multiple threads from a single node just
doesn't cut it due to single node bandwidth limits.  When I was running the
prefetch it was only executing roughly 100 " Queue numExec" operations per
second.  The prefetch operation for a directory with 12 Million files was
going to take over 33 HOURS just to process the file list!
--> Prefetch can run on multiple nodes by configuring multiple GW nodes and
enabling parallel i/o as specified in the docs..link provided below.
Infact it can parallelize data xfer to a single file and also do multiple
files in parallel depending on filesizes and various tuning params.

4) In comparison, parallel rsync operations will require only ONE downtime
to run a final sync over MULTIPLE nodes in parallel at the time that
applications are migrated between file systems and does not require the
complicated AFM configuration.  Yes, there is of course efforts to breakup
the namespace for each rsync operations.  This is really what AFM should be
doing for us... chopping up the namespace intelligently and spawning
prefetch operations across multiple nodes in a configurable way to ensure
performance is met or limiting overall impact of the operation if desired.

--> AFM can be used for data migration without any downtime dictated by AFM
(see above) and it can infact use multiple threads on multiple nodes to do
parallel i/o.

AFM, however, is great for what it is intended to be, a cached data access
mechanism across a WAN.

Thanks,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda
Sent: Tuesday, October 07, 2014 12:03 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster
environment, slow prefetch operations

Hi Bryan,
 AFM supports GPFS multi-cluster..and we have customers already using this
successfully.  Are you using GPFS backend?
Can you explain your configuration in detail and if ls is hung it would
have generated some long waiters.  Maybe this should be pursued separately
via PMR.  You can ping me the details directly if needed along with opening
a PMR per IBM service process.

As for as prefetch is concerned, right now its limited to  one prefetch job
per fileset.  Each job in itself is multi-threaded and can use multi-nodes
to pull in data based on configuration.
"afmNumFlushThreads" tunable controls the number of threads used by AFM.
This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't
show this param for some reason, I will have that updated.)

eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW
changed.

List the change:
 mmlsfileset fs1 prefetchIW --afm -L
Filesets in file system 'fs1':

Attributes for fileset prefetchIW:
===================================
Status                                  Linked
Path                                    /gpfs/fs1/prefetchIW
Id                                      36
afm-associated                          Yes
Target
nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch
Mode                                    independent-writer
File Lookup Refresh Interval            30 (default)
File Open Refresh Interval              30 (default)
Dir Lookup Refresh Interval             60 (default)
Dir Open Refresh Interval               60 (default)
Async Delay                             15 (default)
Last pSnapId                            0
Display Home Snapshots                  no
Number of Gateway Flush Threads         5
Prefetch Threshold                      0 (default)
Eviction Enabled                        yes (default)

AFM parallel i/o can be setup such that multiple GW nodes can be used to
pull in data..more details are available here
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm


and this link outlines tuning params for parallel i/o along with others:
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning


Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:   Bryan Banister <bbanister at jumptrading.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/06/2014 09:57 PM
Subject:        Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:        gpfsug-discuss-bounces at gpfsug.org


We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan

From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster
environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of
file data migration between two GPFS file systems.  The two GPFS file
systems are managed in two separate GPFS clusters.  We have a third GPFS
cluster for compute systems.  We created new independent AFM filesets in
the new GPFS file system that are linked to directories in the old file
system.  Unfortunately access to the AFM filesets from the compute cluster
completely hang.  Access to the other parts of the second file system is
fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command,
but the process appears to be single threaded and the operation was
extremely slow as a result.  According to the Advanced Admin Guide, it is
not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j
FilesetName command (which specifies a list of files to prefetch). Note the
following about prefetching:
v It can be run in parallel on multiple filesets (although more than one
prefetching job cannot be run in parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file
system using the mmapplypolicy command as the documentation describes.
However the AFM prefetch operation is so slow that we are better off
running parallel rsync operations between the file systems versus using the
GPFS AFM prefetch operation.

Cheers,
-Bryan


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From sdinardo at ebi.ac.uk  Thu Oct  9 13:02:44 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 09 Oct 2014 13:02:44 +0100
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
Message-ID: <54367964.1050900@ebi.ac.uk>

Hello everyone,

Suppose we want to build a new GPFS storage using SAN attached storages, 
but instead to put metadata in a shared storage, we want to use  
FusionIO PCI cards locally on the servers to speed up metadata 
operation( http://www.fusionio.com/products/iodrive) and for 
reliability, replicate the metadata in all the servers, will this work 
in case of  server failure?

To make it more clear: If a server fail i will loose also a metadata 
vdisk. Its the replica mechanism its reliable enough to avoid metadata 
corruption and loss of data?

Thanks in advance
Salvatore Di Nardo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141009/010abbe4/attachment.htm>

From bbanister at jumptrading.com  Thu Oct  9 20:31:28 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 9 Oct 2014 19:31:28 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>

Just wanted to pass my GPFS RFE along:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458


Description:

GPFS File System Manager should provide the option to log all file and directory operations that occur in a file system, preferably stored in a TSD (Time Series Database) that could be quickly queried through an API interface and command line tools.    This would allow many required file system management operations to obtain the change log of a file system namespace without having to use the GPFS ILM policy engine to search all file system metadata for changes, and would not need to run massive differential comparisons of file system namespace snapshots to determine what files have been modified, deleted, added, etc.

It would be doubly great if this could be controlled on a per-fileset bases.


Use case:

This could be used for a very large number of file system management applications, including:
1) SOBAR (Scale-Out Backup And Restore)
2) Data Security Auditing and Monitoring applications
3) Async Replication of namespace between GPFS file systems without the requirement of AFM, which must use ILM policies that add unnecessary workload to metadata resources.
4) Application file system access profiling

Please vote for it if you feel it would also benefit your operation, thanks,
-Bryan


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141009/430cce16/attachment.htm>

From service at metamodul.com  Fri Oct 10 13:21:43 2014
From: service at metamodul.com (service at metamodul.com)
Date: Fri, 10 Oct 2014 14:21:43 +0200 (CEST)
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <937639307.291563.1412943703119.JavaMail.open-xchange@oxbaltgw12.schlund.de>

 
> Bryan Banister <bbanister at jumptrading.com> hat am 9. Oktober 2014 um 21:31
> geschrieben:
> 
> 
>  Just wanted to pass my GPFS RFE along:
> 
>  http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
> <http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458>
> 

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."

Cheers
Hajo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/5ad0dff4/attachment.htm>

From pgp at psu.edu  Fri Oct 10 16:04:02 2014
From: pgp at psu.edu (Phil Pishioneri)
Date: Fri, 10 Oct 2014 11:04:02 -0400
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <5437F562.1080609@psu.edu>

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and 
> directory operations that occur in a file system, preferably stored in 
> a TSD (Time Series Database) that could be quickly queried through an 
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS 
(used by the TSM HSM product). A while ago this was posted to the IBM 
GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil


From bbanister at jumptrading.com  Fri Oct 10 16:08:04 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 10 Oct 2014 15:08:04 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <5437F562.1080609@psu.edu>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>

Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


From bdeluca at gmail.com  Fri Oct 10 16:26:40 2014
From: bdeluca at gmail.com (Ben De Luca)
Date: Fri, 10 Oct 2014 23:26:40 +0800
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>

Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com>
wrote:

> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
> option.  Has anybody done this already and are there any examples we could
> look at?
>
> Thanks!
> -Bryan
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
> Sent: Friday, October 10, 2014 10:04 AM
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>
> On 10/9/14 3:31 PM, Bryan Banister wrote:
> >
> > Just wanted to pass my GPFS RFE along:
> >
> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> > 0458
> >
> >
> > *Description*:
> >
> > GPFS File System Manager should provide the option to log all file and
> > directory operations that occur in a file system, preferably stored in
> > a TSD (Time Series Database) that could be quickly queried through an
> > API interface and command line tools.  ...
> >
>
> The rudimentaries for this already exist via the DMAPI interface in GPFS
> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
> DeveloperWorks forum:
>
> On 1/3/11 10:27 AM, dWForums wrote:
> > Author:
> > AlokK.Dhir
> >
> > Message:
> > We have a proof of concept which uses DMAPI to listens to and passively
> logs filesystem changes with a non blocking listener.  This log can be used
> to generate backup sets etc.  Unfortunately, a bug in the current DMAPI
> keeps this approach from working in the case of certain events.  I am told
> 3.4.0.3 may contain a fix.  We will gladly share the code once it is
> working.
>
> -Phil
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/d32dbb50/attachment.htm>

From oehmes at gmail.com  Fri Oct 10 16:51:51 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Fri, 10 Oct 2014 08:51:51 -0700
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
Message-ID: <CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>

Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is
here :
https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com> wrote:

> Id like this to see hot files
>
> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <
> bbanister at jumptrading.com> wrote:
>
>> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
>> option.  Has anybody done this already and are there any examples we could
>> look at?
>>
>> Thanks!
>> -Bryan
>>
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
>> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
>> Sent: Friday, October 10, 2014 10:04 AM
>> To: gpfsug main discussion list
>> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>>
>> On 10/9/14 3:31 PM, Bryan Banister wrote:
>> >
>> > Just wanted to pass my GPFS RFE along:
>> >
>> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
>> > 0458
>> >
>> >
>> > *Description*:
>> >
>> > GPFS File System Manager should provide the option to log all file and
>> > directory operations that occur in a file system, preferably stored in
>> > a TSD (Time Series Database) that could be quickly queried through an
>> > API interface and command line tools.  ...
>> >
>>
>> The rudimentaries for this already exist via the DMAPI interface in GPFS
>> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
>> DeveloperWorks forum:
>>
>> On 1/3/11 10:27 AM, dWForums wrote:
>> > Author:
>> > AlokK.Dhir
>> >
>> > Message:
>> > We have a proof of concept which uses DMAPI to listens to and passively
>> logs filesystem changes with a non blocking listener.  This log can be used
>> to generate backup sets etc.  Unfortunately, a bug in the current DMAPI
>> keeps this approach from working in the case of certain events.  I am told
>> 3.4.0.3 may contain a fix.  We will gladly share the code once it is
>> working.
>>
>> -Phil
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> ________________________________
>>
>> Note: This email is for the confidential use of the named addressee(s)
>> only and may contain proprietary, confidential or privileged information.
>> If you are not the intended recipient, you are hereby notified that any
>> review, dissemination or copying of this email is strictly prohibited, and
>> to please notify the sender immediately and destroy this email and any
>> attachments. Email transmission cannot be guaranteed to be secure or
>> error-free. The Company, therefore, does not make any guarantees as to the
>> completeness or accuracy of this email or any attachments. This email is
>> for informational purposes only and does not constitute a recommendation,
>> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
>> or perform any type of transaction of a financial product.
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/4ca468f9/attachment.htm>

From Paul.Sanchez at deshaw.com  Fri Oct 10 17:02:09 2014
From: Paul.Sanchez at deshaw.com (Sanchez, Paul)
Date: Fri, 10 Oct 2014 16:02:09 +0000
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
In-Reply-To: <54367964.1050900@ebi.ac.uk>
References: <54367964.1050900@ebi.ac.uk>
Message-ID: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>

Hi Salvatore,

We've done this before (non-shared metadata NSDs with GPFS 4.1) and noted these constraints:

* Filesystem descriptor quorum: since it will be easier to have a metadata disk go offline, it's even more important to have three failure groups with FusionIO metadata NSDs in two, and at least a desc_only NSD in the third one. You may even want to explore having three full metadata replicas on FusionIO. (Or perhaps if your workload can tolerate it the third one can be slower but in another GPFS "subnet" so that it isn't used for reads.)

* Make sure to set the correct default metadata replicas in your filesystem, corresponding to the number of metadata failure groups you set up. When a metadata server goes offline, it will take the metadata disks with it, and you want a replica of the metadata to be available.

* When a metadata server goes offline and comes back up (after a maintenance reboot, for example), the non-shared metadata disks will be stopped. Until those are brought back into a  well-known replicated state, you are at risk of a cluster-wide filesystem unmount if there is a subsequent metadata disk failure. But GPFS will continue to work, by default, allowing reads and writes against the remaining metadata replica. You must detect that disks are stopped (e.g. mmlsdisk) and restart them (e.g. with mmchdisk <fs> start ?a).

I haven't seen anyone "recommend" running non-shared disk like this, and I wouldn't do this for things which can't afford to go offline unexpectedly and require a little more operational attention. But it does appear to work.

Thx
Paul Sanchez


From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Salvatore Di Nardo
Sent: Thursday, October 09, 2014 8:03 AM
To: gpfsug main discussion list
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?

Hello everyone,

Suppose we want to build a new GPFS storage using SAN attached storages, but instead to put metadata in a shared storage, we want to use  FusionIO PCI cards locally on the servers to speed up metadata operation( http://www.fusionio.com/products/iodrive) and for reliability, replicate the metadata in all the servers, will this work in case of  server failure?

To make it more clear: If a server fail i will loose also a metadata vdisk. Its the replica mechanism its reliable enough to avoid metadata corruption and loss of data?

Thanks in advance
Salvatore Di Nardo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/f4da20dc/attachment.htm>

From oester at gmail.com  Fri Oct 10 17:05:03 2014
From: oester at gmail.com (Bob Oesterlin)
Date: Fri, 10 Oct 2014 11:05:03 -0500
Subject: [gpfsug-discuss] GPFS File Heat
Message-ID: <CAMNdFvD5kqP7pzR3gL7Os3wo5Q9maRHCrYRSetEPYAggzGTXzA@mail.gmail.com>

As Sven suggests, this is easy to gather once you turn on file heat. I run
this heat.pol file against a file systems to gather the values:

-- heat.pol --

define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1)
END])

rule fh1 external list 'fh' exec ''
rule fh2 list 'fh' weight(FILE_HEAT) show( DISPLAY_NULL(FILE_HEAT) || '|'
|| varchar(file_size) )

-- heat.pol --

Produces output similar to this:

/gpfs/.../specFile.pyc 535089836 5892
/gpfs/.../syspath.py 528685287 806
/gpfs/---/bwe.py 528160670 4607

Actual GPFS file path redacted :)

After that it's a relatively straightforward process to go thru the values.
There is no documentation on what the values really mean, but it does give
you some overall indication of which files are getting the most hits.

I have other information to share; drop me a note at my work email:

robert.oesterlin at nuance.com

Bob Oesterlin
Sr Storage Engineer, Nuance Communications
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/6cea1102/attachment.htm>

From bdeluca at gmail.com  Fri Oct 10 17:09:49 2014
From: bdeluca at gmail.com (Ben De Luca)
Date: Sat, 11 Oct 2014 00:09:49 +0800
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
Message-ID: <CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>

querying this through the policy engine is far to late to do any thing
useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com> wrote:

> Ben,
>
> to get lists of 'Hot Files' turn File Heat on , some discussion about it
> is here :
> https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653
>
> thx.  Sven
>
>
> On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com> wrote:
>
>> Id like this to see hot files
>>
>> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <
>> bbanister at jumptrading.com> wrote:
>>
>>> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
>>> option.  Has anybody done this already and are there any examples we could
>>> look at?
>>>
>>> Thanks!
>>> -Bryan
>>>
>>> -----Original Message-----
>>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
>>> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
>>> Sent: Friday, October 10, 2014 10:04 AM
>>> To: gpfsug main discussion list
>>> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>>>
>>> On 10/9/14 3:31 PM, Bryan Banister wrote:
>>> >
>>> > Just wanted to pass my GPFS RFE along:
>>> >
>>> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
>>> > 0458
>>> >
>>> >
>>> > *Description*:
>>> >
>>> > GPFS File System Manager should provide the option to log all file and
>>> > directory operations that occur in a file system, preferably stored in
>>> > a TSD (Time Series Database) that could be quickly queried through an
>>> > API interface and command line tools.  ...
>>> >
>>>
>>> The rudimentaries for this already exist via the DMAPI interface in GPFS
>>> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
>>> DeveloperWorks forum:
>>>
>>> On 1/3/11 10:27 AM, dWForums wrote:
>>> > Author:
>>> > AlokK.Dhir
>>> >
>>> > Message:
>>> > We have a proof of concept which uses DMAPI to listens to and
>>> passively logs filesystem changes with a non blocking listener.  This log
>>> can be used to generate backup sets etc.  Unfortunately, a bug in the
>>> current DMAPI keeps this approach from working in the case of certain
>>> events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the
>>> code once it is working.
>>>
>>> -Phil
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>> ________________________________
>>>
>>> Note: This email is for the confidential use of the named addressee(s)
>>> only and may contain proprietary, confidential or privileged information.
>>> If you are not the intended recipient, you are hereby notified that any
>>> review, dissemination or copying of this email is strictly prohibited, and
>>> to please notify the sender immediately and destroy this email and any
>>> attachments. Email transmission cannot be guaranteed to be secure or
>>> error-free. The Company, therefore, does not make any guarantees as to the
>>> completeness or accuracy of this email or any attachments. This email is
>>> for informational purposes only and does not constitute a recommendation,
>>> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
>>> or perform any type of transaction of a financial product.
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141011/982198d6/attachment.htm>

From bbanister at jumptrading.com  Fri Oct 10 17:15:22 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 10 Oct 2014 16:15:22 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>

I agree with Ben, I think.

I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/3e5ecf5a/attachment.htm>

From Paul.Sanchez at deshaw.com  Fri Oct 10 17:24:32 2014
From: Paul.Sanchez at deshaw.com (Sanchez, Paul)
Date: Fri, 10 Oct 2014 16:24:32 +0000
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
References: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
Message-ID: <201D6001C896B846A9CFC2E841986AC1451878D2@mailnycmb2a.winmail.deshaw.com>

We've been mounting all filesystems in a canonical location and bind mounting filesets into the namespace.  

One gotcha that we recently encountered though was the selection of /gpfs as the root of the canonical mount path.  (By default automountdir is set to /gpfs/automountdir, which made this seem like a good spot.)  This seems to be where gpfs expects filesystems to be mounted, since there are some hardcoded references in the gpfs.base RPM %pre script (RHEL package for GPFS) which try to nudge processes off of the filesystems before yanking the mounts during an RPM version upgrade.  This however may take an exceedingly long time, since it's doing an 'lsof +D /gpfs' which walks the filesystems.

-Paul Sanchez

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
Sent: Tuesday, September 23, 2014 11:47 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] filesets and mountpoint naming

When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you do differently?

Thanks,
Stuart
--
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oehmes at gmail.com  Fri Oct 10 17:52:27 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Fri, 10 Oct 2014 09:52:27 -0700
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>

The only DMAPI agent i am aware of is a prototype that was written by
tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support
the DMAPI interface , but the code that exposes the API is nothing we
provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:

>  I agree with Ben, I think.
>
>
>
> I don?t want to use the ILM policy engine as that puts a direct workload
> against the metadata storage and server resources.  We need something
> out-of-band, out of the file system operational path.
>
>
>
> Is there a simple DMAPI daemon that would log the file system namespace
> changes that we could use?
>
>
>
> If so are there any limitations?
>
>
>
> And is it possible to set this up in an HA environment?
>
>
>
> Thanks!
>
> -Bryan
>
>
>
> *From:* gpfsug-discuss-bounces at gpfsug.org [mailto:
> gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Ben De Luca
> *Sent:* Friday, October 10, 2014 11:10 AM
>
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
>
>
> querying this through the policy engine is far to late to do any thing
> useful with it
>
>
>
> On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com> wrote:
>
> Ben,
>
>
>
> to get lists of 'Hot Files' turn File Heat on , some discussion about it
> is here :
> https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653
>
>
>
> thx.  Sven
>
>
>
>
>
> On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com> wrote:
>
> Id like this to see hot files
>
>
>
> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <
> bbanister at jumptrading.com> wrote:
>
> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
> option.  Has anybody done this already and are there any examples we could
> look at?
>
> Thanks!
> -Bryan
>
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
> Sent: Friday, October 10, 2014 10:04 AM
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>
> On 10/9/14 3:31 PM, Bryan Banister wrote:
> >
> > Just wanted to pass my GPFS RFE along:
> >
> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> > 0458
> >
> >
> > *Description*:
> >
> > GPFS File System Manager should provide the option to log all file and
> > directory operations that occur in a file system, preferably stored in
> > a TSD (Time Series Database) that could be quickly queried through an
> > API interface and command line tools.  ...
> >
>
> The rudimentaries for this already exist via the DMAPI interface in GPFS
> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
> DeveloperWorks forum:
>
> On 1/3/11 10:27 AM, dWForums wrote:
> > Author:
> > AlokK.Dhir
> >
> > Message:
> > We have a proof of concept which uses DMAPI to listens to and passively
> logs filesystem changes with a non blocking listener.  This log can be used
> to generate backup sets etc.  Unfortunately, a bug in the current DMAPI
> keeps this approach from working in the case of certain events.  I am told
> 3.4.0.3 may contain a fix.  We will gladly share the code once it is
> working.
>
> -Phil
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> ------------------------------
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/0961d1f4/attachment.htm>

From bbanister at jumptrading.com  Fri Oct 10 18:13:16 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 10 Oct 2014 17:13:16 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>

A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted.  I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable.  I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!)

Thanks,
-Bryan

Ps. Hajo said that he couldn?t access the RFE to vote on it:

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."
Cheers
Hajo

Here is what the RFE website states:
Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
A unique URL that you can bookmark and share with others.

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Friday, October 10, 2014 11:52 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
I agree with Ben, I think.

I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM

To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/e60c8dfc/attachment.htm>

From sdinardo at ebi.ac.uk  Sat Oct 11 10:37:10 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Sat, 11 Oct 2014 10:37:10 +0100
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
References: <54367964.1050900@ebi.ac.uk>
	<201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
Message-ID: <5438FA46.7090902@ebi.ac.uk>

Thanks for your answer.
Yes, the idea is to have 3 servers in 3 different failure groups. Each 
of them with a  drive and set 3 metadata replica as the default one.

I have not considered that the vdisks could be off after a 'reboot' or 
failure, so that's a good point, but anyway , after a failure or even a 
standard reboot, the server and the cluster have to be checked anyway, 
and i always check the vdisk status, so no big deal.

Your answer made me consider also another thing...  Once put them back 
online, they will be restriped automatically or should i run every time  
'mmrestripefs' to verify/correct the replicas?

I understand that use lodal disk sound strange, infact our first idea 
was just to add some ssd to the shared storage, but then we considered 
that the sas cable could be a huge bottleneck. The cost difference is 
not huge and the fusioio locally on the server would make the metadata 
just fly.


On 10/10/14 17:02, Sanchez, Paul wrote:
>
> Hi Salvatore,
>
> We've done this before (non-shared metadata NSDs with GPFS 4.1) and 
> noted these constraints:
>
> * Filesystem descriptor quorum: since it will be easier to have a 
> metadata disk go offline, it's even more important to have three 
> failure groups with FusionIO metadata NSDs in two, and at least a 
> desc_only NSD in the third one. You may even want to explore having 
> three full metadata replicas on FusionIO. (Or perhaps if your workload 
> can tolerate it the third one can be slower but in another GPFS 
> "subnet" so that it isn't used for reads.)
>
> * Make sure to set the correct default metadata replicas in your 
> filesystem, corresponding to the number of metadata failure groups you 
> set up. When a metadata server goes offline, it will take the metadata 
> disks with it, and you want a replica of the metadata to be available.
>
> * When a metadata server goes offline and comes back up (after a 
> maintenance reboot, for example), the non-shared metadata disks will 
> be stopped. Until those are brought back into a  well-known replicated 
> state, you are at risk of a cluster-wide filesystem unmount if there 
> is a subsequent metadata disk failure. But GPFS will continue to work, 
> by default, allowing reads and writes against the remaining metadata 
> replica. You must detect that disks are stopped (e.g. mmlsdisk) and 
> restart them (e.g. with mmchdisk <fs> start ?a).
>
> I haven't seen anyone "recommend" running non-shared disk like this, 
> and I wouldn't do this for things which can't afford to go offline 
> unexpectedly and require a little more operational attention. But it 
> does appear to work.
>
> Thx
> Paul Sanchez
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di 
> Nardo
> *Sent:* Thursday, October 09, 2014 8:03 AM
> *To:* gpfsug main discussion list
> *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable?
>
> Hello everyone,
>
> Suppose we want to build a new GPFS storage using SAN attached 
> storages, but instead to put metadata in a shared storage, we want to 
> use  FusionIO PCI cards locally on the servers to speed up metadata 
> operation( http://www.fusionio.com/products/iodrive) and for 
> reliability, replicate the metadata in all the servers, will this work 
> in case of  server failure?
>
> To make it more clear: If a server fail i will loose also a metadata 
> vdisk. Its the replica mechanism its reliable enough to avoid metadata 
> corruption and loss of data?
>
> Thanks in advance
> Salvatore Di Nardo
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141011/f580507a/attachment.htm>

From service at metamodul.com  Sun Oct 12 17:03:56 2014
From: service at metamodul.com (MetaService)
Date: Sun, 12 Oct 2014 18:03:56 +0200
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
References: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
Message-ID: <1413129836.4846.9.camel@titan>

My preferred naming convention is to use the cluster name or part of it
as the base directory for all GPFS mounts.

Example: Clustername=c1_eum would mean that:

/c1_eum/

would be the base directory for all Cluster c1_eum GPFSs

In case a second local cluster would exist its root mount point would
be /c2_eum/

Even in case of mounting remote clusters a naming collision is not very
likely.

BTW: For accessing the the final directories /.../scratch ... the user
should not rely on the mount points but on given variables provided.

CLS_HOME=/...
CLS_SCRATCH=/....

hth
Hajo


From lhorrocks-barlow at ocf.co.uk  Fri Oct 10 17:48:24 2014
From: lhorrocks-barlow at ocf.co.uk (Laurence Horrocks- Barlow)
Date: Fri, 10 Oct 2014 17:48:24 +0100
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
References: <54367964.1050900@ebi.ac.uk>
	<201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
Message-ID: <54380DD8.2020909@ocf.co.uk>

Hi Salvatore,

Just to add that when the local metadata disk fails or the server goes 
offline there will most likely be an I/O interruption/pause whist the 
GPFS cluster renegotiates.

The main concept to be aware of (as Paul mentioned) is that when a disk 
goes offline it will appear down to GPFS, once you've started the disk 
again it will rediscover and scan the metadata for any missing updates, 
these updates are then repaired/replicated again.

Laurence Horrocks-Barlow
Linux Systems Software Engineer
OCF plc

Tel: +44 (0)114 257 2200
Fax: +44 (0)114 257 0022
Web: www.ocf.co.uk <http://www.ocf.co.uk>
Blog: blog.ocf.co.uk <http://blog.ocf.co.uk>
Twitter: @ocfplc <http://twitter.com/#%21/ocfplc>

OCF plc is a company registered in England and Wales. Registered number 
4132533, VAT number GB 780 6803 14. Registered office address: OCF plc, 
5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 
2PG.

This message is private and confidential. If you have received this 
message in error, please notify us and remove it from your system.


On 10/10/2014 17:02, Sanchez, Paul wrote:
>
> Hi Salvatore,
>
> We've done this before (non-shared metadata NSDs with GPFS 4.1) and 
> noted these constraints:
>
> * Filesystem descriptor quorum: since it will be easier to have a 
> metadata disk go offline, it's even more important to have three 
> failure groups with FusionIO metadata NSDs in two, and at least a 
> desc_only NSD in the third one. You may even want to explore having 
> three full metadata replicas on FusionIO. (Or perhaps if your workload 
> can tolerate it the third one can be slower but in another GPFS 
> "subnet" so that it isn't used for reads.)
>
> * Make sure to set the correct default metadata replicas in your 
> filesystem, corresponding to the number of metadata failure groups you 
> set up. When a metadata server goes offline, it will take the metadata 
> disks with it, and you want a replica of the metadata to be available.
>
> * When a metadata server goes offline and comes back up (after a 
> maintenance reboot, for example), the non-shared metadata disks will 
> be stopped. Until those are brought back into a  well-known replicated 
> state, you are at risk of a cluster-wide filesystem unmount if there 
> is a subsequent metadata disk failure. But GPFS will continue to work, 
> by default, allowing reads and writes against the remaining metadata 
> replica. You must detect that disks are stopped (e.g. mmlsdisk) and 
> restart them (e.g. with mmchdisk <fs> start ?a).
>
> I haven't seen anyone "recommend" running non-shared disk like this, 
> and I wouldn't do this for things which can't afford to go offline 
> unexpectedly and require a little more operational attention. But it 
> does appear to work.
>
> Thx
> Paul Sanchez
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di 
> Nardo
> *Sent:* Thursday, October 09, 2014 8:03 AM
> *To:* gpfsug main discussion list
> *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable?
>
> Hello everyone,
>
> Suppose we want to build a new GPFS storage using SAN attached 
> storages, but instead to put metadata in a shared storage, we want to 
> use  FusionIO PCI cards locally on the servers to speed up metadata 
> operation( http://www.fusionio.com/products/iodrive) and for 
> reliability, replicate the metadata in all the servers, will this work 
> in case of  server failure?
>
> To make it more clear: If a server fail i will loose also a metadata 
> vdisk. Its the replica mechanism its reliable enough to avoid metadata 
> corruption and loss of data?
>
> Thanks in advance
> Salvatore Di Nardo
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/8d4ab475/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lhorrocks-barlow.vcf
Type: text/x-vcard
Size: 388 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/8d4ab475/attachment.vcf>

From kraemerf at de.ibm.com  Mon Oct 13 12:10:17 2014
From: kraemerf at de.ibm.com (Frank Kraemer)
Date: Mon, 13 Oct 2014 13:10:17 +0200
Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014,
	Duesseldorf, Germany
Message-ID: <OF53876A5D.2ACC8EFA-ONC1257D70.003D5DC4-C1257D70.003D5DC7@de.ibm.com>


GPFS at  LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany
Oct 14th 11:15-12:05 Room 18
http://sched.co/1uMYEWK

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Hechtsheimer Str. 2, 55131 Mainz
mailto:kraemerf at de.ibm.com
voice: +49171-3043699
IBM Germany


From service at metamodul.com  Mon Oct 13 16:49:44 2014
From: service at metamodul.com (service at metamodul.com)
Date: Mon, 13 Oct 2014 17:49:44 +0200 (CEST)
Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014,
 Duesseldorf, Germany
In-Reply-To: <OF53876A5D.2ACC8EFA-ONC1257D70.003D5DC4-C1257D70.003D5DC7@de.ibm.com>
References: <OF53876A5D.2ACC8EFA-ONC1257D70.003D5DC4-C1257D70.003D5DC7@de.ibm.com>
Message-ID: <994787708.574787.1413215384447.JavaMail.open-xchange@oxbaltgw12.schlund.de>

Hallo Frank,
the announcement is a little bit to late for me. Would be nice if you could
share your speech later.
 
cheers
Hajo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141013/cf4b67b2/attachment.htm>

From sdinardo at ebi.ac.uk  Tue Oct 14 15:39:35 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 14 Oct 2014 15:39:35 +0100
Subject: [gpfsug-discuss] wait for permission to append to log
Message-ID: <543D35A7.7080800@ebi.ac.uk>

hello all,
could someone explain me the meaning of those waiters?

gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'

Does it means that the vdisk logs are struggling?

Regards,
Salvatore


From oehmes at us.ibm.com  Tue Oct 14 15:51:10 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 07:51:10 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <543D35A7.7080800@ebi.ac.uk>
References: <543D35A7.7080800@ebi.ac.uk>
Message-ID: <OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>

it means there is contention on inserting data into the fast write log on 
the GSS Node, which could be config or workload related
what GSS code version are you running and how are the nodes connected with 
each other (Ethernet or IB) ? 

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Salvatore Di Nardo <sdinardo at ebi.ac.uk>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/14/2014 07:40 AM
Subject:        [gpfsug-discuss] wait for permission to append to log
Sent by:        gpfsug-discuss-bounces at gpfsug.org


hello all,
could someone explain me the meaning of those waiters?

gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'

Does it means that the vdisk logs are struggling?

Regards,
Salvatore

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/63d72890/attachment.htm>

From sdinardo at ebi.ac.uk  Tue Oct 14 16:23:01 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 14 Oct 2014 16:23:01 +0100
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
References: <543D35A7.7080800@ebi.ac.uk>
	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
Message-ID: <543D3FD5.1060705@ebi.ac.uk>


On 14/10/14 15:51, Sven Oehme wrote:
> it means there is contention on inserting data into the fast write log 
> on the GSS Node, which could be config or workload related
> what GSS code version are you running 

            [root at ebi5-251 ~]# mmdiag --version

            === mmdiag: version ===
            Current GPFS build: "3.5.0-11 efix1 (888041)".
            Built on Jul  9 2013 at 18:03:32
            Running 6 days 2 hours 10 minutes 35 secs


> and how are the nodes connected with each other (Ethernet or IB) ?
ethernet. they use the same bonding (4x10Gb/s) where the data is 
passing. We don't have admin dedicated network


            [root at gss03a ~]# mmlscluster

            GPFS cluster information
            ========================
               GPFS cluster name:         GSS.ebi.ac.uk
               GPFS cluster id:           17987981184946329605
               GPFS UID domain:           GSS.ebi.ac.uk
               Remote shell command:      /usr/bin/ssh
               Remote file copy command:  /usr/bin/scp

            GPFS cluster configuration servers:
            -----------------------------------
               Primary server:    gss01a.ebi.ac.uk
               Secondary server:  gss02b.ebi.ac.uk

              Node  Daemon node name    IP address  Admin node name
            Designation
            -----------------------------------------------------------------------
                1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk
            quorum-manager
                2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk
            quorum-manager
                3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk
            quorum-manager
                4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk
            quorum-manager
                5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk
            quorum-manager
                6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk
            quorum-manager


*Note:* The 3 node "pairs" (gss01, gss02 and gss03)  are in different 
subnet because of datacenter constraints ( They are not physically in 
the same row, and due to network constraints was not possible to put 
them in the same subnet). The packets are routed, but should not be a 
problem as there is 160Gb/s bandwidth between them.

Regards,
Salvatore


> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
>
>
> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
> Date: 10/14/2014 07:40 AM
> Subject: [gpfsug-discuss] wait for permission to append to log
> Sent by: gpfsug-discuss-bounces at gpfsug.org
> ------------------------------------------------------------------------
>
>
>
> hello all,
> could someone explain me the meaning of those waiters?
>
> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>
> Does it means that the vdisk logs are struggling?
>
> Regards,
> Salvatore
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/36a5bc7e/attachment.htm>

From oehmes at us.ibm.com  Tue Oct 14 17:22:41 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 09:22:41 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <543D3FD5.1060705@ebi.ac.uk>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>
Message-ID: <OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>

your GSS code version is very backlevel. 

can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk 
as well as mmlsconfig and mmlsfs all

thx. Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Salvatore Di Nardo <sdinardo at ebi.ac.uk>
To:     gpfsug-discuss at gpfsug.org
Date:   10/14/2014 08:23 AM
Subject:        Re: [gpfsug-discuss] wait for permission to append to log
Sent by:        gpfsug-discuss-bounces at gpfsug.org


On 14/10/14 15:51, Sven Oehme wrote:
it means there is contention on inserting data into the fast write log on 
the GSS Node, which could be config or workload related 
what GSS code version are you running 
[root at ebi5-251 ~]# mmdiag --version

=== mmdiag: version ===
Current GPFS build: "3.5.0-11 efix1 (888041)".
Built on Jul  9 2013 at 18:03:32
Running 6 days 2 hours 10 minutes 35 secs


and how are the nodes connected with each other (Ethernet or IB) ? 
ethernet. they use the same bonding (4x10Gb/s) where the data is passing. 
We don't have admin dedicated network

[root at gss03a ~]# mmlscluster 

GPFS cluster information
========================
  GPFS cluster name:         GSS.ebi.ac.uk
  GPFS cluster id:           17987981184946329605
  GPFS UID domain:           GSS.ebi.ac.uk
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp

GPFS cluster configuration servers:
-----------------------------------
  Primary server:    gss01a.ebi.ac.uk
  Secondary server:  gss02b.ebi.ac.uk

 Node  Daemon node name    IP address  Admin node name     Designation
-----------------------------------------------------------------------
   1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk    quorum-manager
   2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk    quorum-manager
   3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk    quorum-manager
   4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk    quorum-manager
   5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk    quorum-manager
   6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk    quorum-manager


Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different subnet 
because of datacenter constraints ( They are not physically in the same 
row, and due to network constraints was not possible to put them in the 
same subnet). The packets are routed, but should not be a problem as there 
is 160Gb/s bandwidth between them.

Regards,
Salvatore


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/14/2014 07:40 AM 
Subject:        [gpfsug-discuss] wait for permission to append to log 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


hello all,
could someone explain me the meaning of those waiters?

gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'

Does it means that the vdisk logs are struggling?

Regards,
Salvatore

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/a578f87a/attachment.htm>

From sdinardo at ebi.ac.uk  Tue Oct 14 17:39:18 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 14 Oct 2014 17:39:18 +0100
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>	<543D3FD5.1060705@ebi.ac.uk>
	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
Message-ID: <543D51B6.3070602@ebi.ac.uk>

Thanks in advance for your help.

We have 6 RG:

              recovery group        vdisks     vdisks servers
              ------------------  -----------  ------  -------
              gss01a                        4       8
            gss01a.ebi.ac.uk,gss01b.ebi.ac.uk
              gss01b                        4       8
            gss01b.ebi.ac.uk,gss01a.ebi.ac.uk
              gss02a                        4       8
            gss02a.ebi.ac.uk,gss02b.ebi.ac.uk
              gss02b                        4       8
            gss02b.ebi.ac.uk,gss02a.ebi.ac.uk
              gss03a                        4       8
            gss03a.ebi.ac.uk,gss03b.ebi.ac.uk
              gss03b                        4       8
            gss03b.ebi.ac.uk,gss03a.ebi.ac.uk


Check the attached file for RG details.
Following mmlsconfig:

            [root at gss01a ~]# mmlsconfig
            Configuration data for cluster GSS.ebi.ac.uk:
            ---------------------------------------------
            myNodeConfigNumber 1
            clusterName GSS.ebi.ac.uk
            clusterId 17987981184946329605
            autoload no
            dmapiFileHandleSize 32
            minReleaseLevel 3.5.0.11
            [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
            pagepool 38g
            nsdRAIDBufferPoolSizePct 80
            maxBufferDescs 2m
            numaMemoryInterleave yes
            prefetchPct 5
            maxblocksize 16m
            nsdRAIDTracks 128k
            ioHistorySize 64k
            nsdRAIDSmallBufferSize 256k
            nsdMaxWorkerThreads 3k
            nsdMinWorkerThreads 3k
            nsdRAIDSmallThreadRatio 2
            nsdRAIDThreadsPerQueue 16
            nsdClientCksumTypeLocal ck64
            nsdClientCksumTypeRemote ck64
            nsdRAIDEventLogToConsole all
            nsdRAIDFastWriteFSDataLimit 64k
            nsdRAIDFastWriteFSMetadataLimit 256k
            nsdRAIDReconstructAggressiveness 1
            nsdRAIDFlusherBuffersLowWatermarkPct 20
            nsdRAIDFlusherBuffersLimitPct 80
            nsdRAIDFlusherTracksLowWatermarkPct 20
            nsdRAIDFlusherTracksLimitPct 80
            nsdRAIDFlusherFWLogHighWatermarkMB 1000
            nsdRAIDFlusherFWLogLimitMB 5000
            nsdRAIDFlusherThreadsLowWatermark 1
            nsdRAIDFlusherThreadsHighWatermark 512
            nsdRAIDBlockDeviceMaxSectorsKB 4096
            nsdRAIDBlockDeviceNrRequests 32
            nsdRAIDBlockDeviceQueueDepth 16
            nsdRAIDBlockDeviceScheduler deadline
            nsdRAIDMaxTransientStale2FT 1
            nsdRAIDMaxTransientStale3FT 1
            syncWorkerThreads 256
            tscWorkerPool 64
            nsdInlineWriteMax 32k
            maxFilesToCache 12k
            maxStatCache 512
            maxGeneralThreads 1280
            flushedDataTarget 1024
            flushedInodeTarget 1024
            maxFileCleaners 1024
            maxBufferCleaners 1024
            logBufferCount 20
            logWrapAmountPct 2
            logWrapThreads 128
            maxAllocRegionsPerNode 32
            maxBackgroundDeletionThreads 16
            maxInodeDeallocPrefetch 128
            maxMBpS 16000
            maxReceiverThreads 128
            worker1Threads 1024
            worker3Threads 32
            [common]
            cipherList AUTHONLY
            socketMaxListenConnections 1500
            failureDetectionTime 60
            [common]
            adminMode central

            File systems in cluster GSS.ebi.ac.uk:
            --------------------------------------
            /dev/gpfs1

For more configuration paramenters i also attached a file with the 
complete output of mmdiag --config.


and mmlsfs:


            File system attributes for /dev/gpfs1:
            ======================================
            flag                value                    description
            ------------------- ------------------------
            -----------------------------------
              -f                 32768                    Minimum
            fragment size in bytes (system pool)
                                 262144                   Minimum
            fragment size in bytes (other pools)
              -i                 512                      Inode size in
            bytes
              -I                 32768                    Indirect block
            size in bytes
              -m                 2                        Default number
            of metadata replicas
              -M                 2                        Maximum number
            of metadata replicas
              -r                 1                        Default number
            of data replicas
              -R                 2                        Maximum number
            of data replicas
              -j                 scatter                  Block
            allocation type
              -D                 nfs4                     File locking
            semantics in effect
              -k                 all                      ACL semantics
            in effect
              -n                 1000                     Estimated
            number of nodes that will mount file system
              -B                 1048576                  Block size
            (system pool)
                                 8388608                  Block size
            (other pools)
              -Q                 user;group;fileset       Quotas enforced
                                 user;group;fileset       Default quotas
            enabled
              --filesetdf        no                       Fileset df
            enabled?
              -V                 13.23 (3.5.0.7)          File system
            version
              --create-time      Tue Mar 18 16:01:24 2014 File system
            creation time
              -u                 yes                      Support for
            large LUNs?
              -z                 no                       Is DMAPI enabled?
              -L                 4194304                  Logfile size
              -E                 yes                      Exact mtime
            mount option
              -S                 yes                      Suppress atime
            mount option
              -K                 whenpossible             Strict replica
            allocation option
              --fastea           yes                      Fast external
            attributes enabled?
              --inode-limit      134217728                Maximum number
            of inodes
              -P                 system;data              Disk storage
            pools in file system
              -d
            gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
              -d
            gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
              -d
            gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
              -d
            gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
              -d
            gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3
            Disks in file system
              --perfileset-quota no                       Per-fileset
            quota enforcement
              -A                 yes                      Automatic
            mount option
              -o                 none                     Additional
            mount options
              -T                 /gpfs1                   Default mount
            point
              --mount-priority   0                        Mount priority


Regards,
Salvatore


On 14/10/14 17:22, Sven Oehme wrote:
> your GSS code version is very backlevel.
>
> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk
> as well as mmlsconfig and mmlsfs all
>
> thx. Sven
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
>
>
> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
> To: gpfsug-discuss at gpfsug.org
> Date: 10/14/2014 08:23 AM
> Subject: Re: [gpfsug-discuss] wait for permission to append to log
> Sent by: gpfsug-discuss-bounces at gpfsug.org
> ------------------------------------------------------------------------
>
>
>
>
> On 14/10/14 15:51, Sven Oehme wrote:
> it means there is contention on inserting data into the fast write log 
> on the GSS Node, which could be config or workload related
> what GSS code version are you running
> [root at ebi5-251 ~]# mmdiag --version
>
> === mmdiag: version ===
> Current GPFS build: "3.5.0-11 efix1 (888041)".
> Built on Jul  9 2013 at 18:03:32
> Running 6 days 2 hours 10 minutes 35 secs
>
>
>
> and how are the nodes connected with each other (Ethernet or IB) ?
> ethernet. they use the same bonding (4x10Gb/s) where the data is 
> passing. We don't have admin dedicated network
>
> [root at gss03a ~]# mmlscluster
>
> GPFS cluster information
> ========================
>   GPFS cluster name: GSS.ebi.ac.uk
>   GPFS cluster id: 17987981184946329605
>   GPFS UID domain: GSS.ebi.ac.uk
>   Remote shell command:      /usr/bin/ssh
>   Remote file copy command:  /usr/bin/scp
>
> GPFS cluster configuration servers:
> -----------------------------------
>   Primary server:    gss01a.ebi.ac.uk
>   Secondary server:  gss02b.ebi.ac.uk
>
>  Node  Daemon node name    IP address  Admin node name     Designation
> -----------------------------------------------------------------------
>    1   gss01a.ebi.ac.uk    10.7.28.2 gss01a.ebi.ac.uk    quorum-manager
>    2   gss01b.ebi.ac.uk    10.7.28.3 gss01b.ebi.ac.uk    quorum-manager
>    3   gss02a.ebi.ac.uk    10.7.28.67 gss02a.ebi.ac.uk    quorum-manager
>    4   gss02b.ebi.ac.uk    10.7.28.66 gss02b.ebi.ac.uk    quorum-manager
>    5   gss03a.ebi.ac.uk    10.7.28.34 gss03a.ebi.ac.uk    quorum-manager
>    6   gss03b.ebi.ac.uk    10.7.28.35 gss03b.ebi.ac.uk    quorum-manager
>
>
> *Note:* The 3 node "pairs" (gss01, gss02 and gss03)  are in different 
> subnet because of datacenter constraints ( They are not physically in 
> the same row, and due to network constraints was not possible to put 
> them in the same subnet). The packets are routed, but should not be a 
> problem as there is 160Gb/s bandwidth between them.
>
> Regards,
> Salvatore
>
>
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: _oehmes at us.ibm.com_ <mailto:oehmes at us.ibm.com>
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
>
>
> From: Salvatore Di Nardo _<sdinardo at ebi.ac.uk>_ 
> <mailto:sdinardo at ebi.ac.uk>
> To: gpfsug main discussion list _<gpfsug-discuss at gpfsug.org>_ 
> <mailto:gpfsug-discuss at gpfsug.org>
> Date: 10/14/2014 07:40 AM
> Subject: [gpfsug-discuss] wait for permission to append to log
> Sent by: _gpfsug-discuss-bounces at gpfsug.org_ 
> <mailto:gpfsug-discuss-bounces at gpfsug.org>
> ------------------------------------------------------------------------
>
>
>
> hello all,
> could someone explain me the meaning of those waiters?
>
> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>
> Does it means that the vdisk logs are struggling?
>
> Regards,
> Salvatore
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org_
> __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/349c39d3/attachment.htm>
-------------- next part --------------

                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss01a                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       42%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub        4%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub        4%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       59%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s01                 2        DA3             110 GiB  ok                
 e1d1s02                 2        DA2             110 GiB  ok                
 e1d1s03log              2        LOG             186 GiB  ok                
 e1d1s04                 2        DA1             108 GiB  ok                
 e1d1s05                 2        DA2             110 GiB  ok                
 e1d1s06                 2        DA3             110 GiB  ok                
 e1d2s01                 2        DA1             108 GiB  ok                
 e1d2s02                 2        DA2             110 GiB  ok                
 e1d2s03                 2        DA3             110 GiB  ok                
 e1d2s04                 2        DA1             108 GiB  ok                
 e1d2s05                 2        DA2             110 GiB  ok                
 e1d2s06                 2        DA3             110 GiB  ok                
 e1d3s01                 2        DA1             108 GiB  ok                
 e1d3s02                 2        DA2             110 GiB  ok                
 e1d3s03                 2        DA3             110 GiB  ok                
 e1d3s04                 2        DA1             108 GiB  ok                
 e1d3s05                 2        DA2             110 GiB  ok                
 e1d3s06                 2        DA3             110 GiB  ok                
 e1d4s01                 2        DA1             108 GiB  ok                
 e1d4s02                 2        DA2             110 GiB  ok                
 e1d4s03                 2        DA3             110 GiB  ok                
 e1d4s04                 2        DA1             108 GiB  ok                
 e1d4s05                 2        DA2             110 GiB  ok                
 e1d4s06                 2        DA3             110 GiB  ok                
 e1d5s01                 2        DA1             108 GiB  ok                
 e1d5s02                 2        DA2             110 GiB  ok                
 e1d5s03                 2        DA3             110 GiB  ok                
 e1d5s04                 2        DA1             108 GiB  ok                
 e1d5s05                 2        DA2             110 GiB  ok                
 e1d5s06                 2        DA3             110 GiB  ok                
 e2d1s01                 2        DA3             110 GiB  ok                
 e2d1s02                 2        DA2             110 GiB  ok                
 e2d1s03log              2        LOG             186 GiB  ok                
 e2d1s04                 2        DA1             108 GiB  ok                
 e2d1s05                 2        DA2             110 GiB  ok                
 e2d1s06                 2        DA3             110 GiB  ok                
 e2d2s01                 2        DA1             108 GiB  ok                
 e2d2s02                 2        DA2             110 GiB  ok                
 e2d2s03                 2        DA3             110 GiB  ok                
 e2d2s04                 2        DA1             108 GiB  ok                
 e2d2s05                 2        DA2             110 GiB  ok                
 e2d2s06                 2        DA3             110 GiB  ok                
 e2d3s01                 2        DA1             108 GiB  ok                
 e2d3s02                 2        DA2             110 GiB  ok                
 e2d3s03                 2        DA3             110 GiB  ok                
 e2d3s04                 2        DA1             108 GiB  ok                
 e2d3s05                 2        DA2             110 GiB  ok                
 e2d3s06                 2        DA3             110 GiB  ok                
 e2d4s01                 2        DA1             108 GiB  ok                
 e2d4s02                 2        DA2             110 GiB  ok                
 e2d4s03                 2        DA3             110 GiB  ok                
 e2d4s04                 2        DA1             108 GiB  ok                
 e2d4s05                 2        DA2             110 GiB  ok                
 e2d4s06                 2        DA3             110 GiB  ok                
 e2d5s01                 2        DA1             108 GiB  ok                
 e2d5s02                 2        DA2             110 GiB  ok                
 e2d5s03                 2        DA3             110 GiB  ok                
 e2d5s04                 2        DA1             108 GiB  ok                
 e2d5s05                 2        DA2             110 GiB  ok                
 e2d5s06                 2        DA3             110 GiB  ok                
 e3d1s01                 2        DA1             108 GiB  ok                
 e3d1s02                 2        DA3             110 GiB  ok                
 e3d1s03log              2        LOG             186 GiB  ok                
 e3d1s04                 2        DA1             108 GiB  ok                
 e3d1s05                 2        DA2             110 GiB  ok                
 e3d1s06                 2        DA3             110 GiB  ok                
 e3d2s01                 2        DA1             108 GiB  ok                
 e3d2s02                 2        DA2             110 GiB  ok                
 e3d2s03                 2        DA3             110 GiB  ok                
 e3d2s04                 2        DA1             108 GiB  ok                
 e3d2s05                 2        DA2             110 GiB  ok                
 e3d2s06                 2        DA3             110 GiB  ok                
 e3d3s01                 2        DA1             108 GiB  ok                
 e3d3s02                 2        DA2             110 GiB  ok                
 e3d3s03                 2        DA3             110 GiB  ok                
 e3d3s04                 2        DA1             108 GiB  ok                
 e3d3s05                 2        DA2             110 GiB  ok                
 e3d3s06                 2        DA3             110 GiB  ok                
 e3d4s01                 2        DA1             108 GiB  ok                
 e3d4s02                 2        DA2             110 GiB  ok                
 e3d4s03                 2        DA3             110 GiB  ok                
 e3d4s04                 2        DA1             108 GiB  ok                
 e3d4s05                 2        DA2             110 GiB  ok                
 e3d4s06                 2        DA3             110 GiB  ok                
 e3d5s01                 2        DA1             108 GiB  ok                
 e3d5s02                 2        DA2             110 GiB  ok                
 e3d5s03                 2        DA3             110 GiB  ok                
 e3d5s04                 2        DA1             108 GiB  ok                
 e3d5s05                 2        DA2             110 GiB  ok                
 e3d5s06                 2        DA3             110 GiB  ok                
 e4d1s01                 2        DA1             108 GiB  ok                
 e4d1s02                 2        DA3             110 GiB  ok                
 e4d1s04                 2        DA1             108 GiB  ok                
 e4d1s05                 2        DA2             110 GiB  ok                
 e4d1s06                 2        DA3             110 GiB  ok                
 e4d2s01                 2        DA1             108 GiB  ok                
 e4d2s02                 2        DA2             110 GiB  ok                
 e4d2s03                 2        DA3             110 GiB  ok                
 e4d2s04                 2        DA1             106 GiB  ok                
 e4d2s05                 2        DA2             110 GiB  ok                
 e4d2s06                 2        DA3             110 GiB  ok                
 e4d3s01                 2        DA1             106 GiB  ok                
 e4d3s02                 2        DA2             110 GiB  ok                
 e4d3s03                 2        DA3             110 GiB  ok                
 e4d3s04                 2        DA1             106 GiB  ok                
 e4d3s05                 2        DA2             110 GiB  ok                
 e4d3s06                 2        DA3             110 GiB  ok                
 e4d4s01                 2        DA1             106 GiB  ok                
 e4d4s02                 2        DA2             110 GiB  ok                
 e4d4s03                 2        DA3             110 GiB  ok                
 e4d4s04                 2        DA1             106 GiB  ok                
 e4d4s05                 2        DA2             110 GiB  ok                
 e4d4s06                 2        DA3             110 GiB  ok                
 e4d5s01                 2        DA1             106 GiB  ok                
 e4d5s02                 2        DA2             110 GiB  ok                
 e4d5s03                 2        DA3             110 GiB  ok                
 e4d5s04                 2        DA1             106 GiB  ok                
 e4d5s05                 2        DA2             110 GiB  ok                
 e4d5s06                 2        DA3             110 GiB  ok                
 e5d1s01                 2        DA1             106 GiB  ok                
 e5d1s02                 2        DA2             110 GiB  ok                
 e5d1s04                 2        DA1             106 GiB  ok                
 e5d1s05                 2        DA2             110 GiB  ok                
 e5d1s06                 2        DA3             110 GiB  ok                
 e5d2s01                 2        DA1             106 GiB  ok                
 e5d2s02                 2        DA2             110 GiB  ok                
 e5d2s03                 2        DA3             110 GiB  ok                
 e5d2s04                 2        DA1             106 GiB  ok                
 e5d2s05                 2        DA2             110 GiB  ok                
 e5d2s06                 2        DA3             110 GiB  ok                
 e5d3s01                 2        DA1             106 GiB  ok                
 e5d3s02                 2        DA2             110 GiB  ok                
 e5d3s03                 2        DA3             110 GiB  ok                
 e5d3s04                 2        DA1             106 GiB  ok                
 e5d3s05                 2        DA2             110 GiB  ok                
 e5d3s06                 2        DA3             110 GiB  ok                
 e5d4s01                 2        DA1             106 GiB  ok                
 e5d4s02                 2        DA2             110 GiB  ok                
 e5d4s03                 2        DA3             110 GiB  ok                
 e5d4s04                 2        DA1             106 GiB  ok                
 e5d4s05                 2        DA2             110 GiB  ok                
 e5d4s06                 2        DA3             110 GiB  ok                
 e5d5s01                 2        DA1             106 GiB  ok                
 e5d5s02                 2        DA2             110 GiB  ok                
 e5d5s03                 2        DA3             110 GiB  ok                
 e5d5s04                 2        DA1             106 GiB  ok                
 e5d5s05                 2        DA2             110 GiB  ok                
 e5d5s06                 2        DA3             110 GiB  ok                
 e6d1s01                 2        DA1             106 GiB  ok                
 e6d1s02                 2        DA2             110 GiB  ok                
 e6d1s04                 2        DA1             106 GiB  ok                
 e6d1s05                 2        DA2             110 GiB  ok                
 e6d1s06                 2        DA3             110 GiB  ok                
 e6d2s01                 2        DA1             106 GiB  ok                
 e6d2s02                 2        DA2             110 GiB  ok                
 e6d2s03                 2        DA3             110 GiB  ok                
 e6d2s04                 2        DA1             106 GiB  ok                
 e6d2s05                 2        DA2             110 GiB  ok                
 e6d2s06                 2        DA3             110 GiB  ok                
 e6d3s01                 2        DA1             106 GiB  ok                
 e6d3s02                 2        DA2             110 GiB  ok                
 e6d3s03                 2        DA3             110 GiB  ok                
 e6d3s04                 2        DA1             106 GiB  ok                
 e6d3s05                 2        DA2             108 GiB  ok                
 e6d3s06                 2        DA3             108 GiB  ok                
 e6d4s01                 2        DA1             106 GiB  ok                
 e6d4s02                 2        DA2             108 GiB  ok                
 e6d4s03                 2        DA3             108 GiB  ok                
 e6d4s04                 2        DA1             106 GiB  ok                
 e6d4s05                 2        DA2             108 GiB  ok                
 e6d4s06                 2        DA3             108 GiB  ok                
 e6d5s01                 2        DA1             106 GiB  ok                
 e6d5s02                 2        DA2             108 GiB  ok                
 e6d5s03                 2        DA3             108 GiB  ok                
 e6d5s04                 2        DA1             106 GiB  ok                
 e6d5s05                 2        DA2             108 GiB  ok                
 e6d5s06                 2        DA3             108 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss01a_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss01a_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss01a_MetaData_8M_3p_1  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss01a_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss01a_MetaData_8M_3p_3  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss01a_Data_8M_3p_1  8+3p                DA3              99 TiB      8 MiB     32 KiB             
 gss01a_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss01a_Data_8M_3p_3  8+3p                DA1              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss01a.ebi.ac.uk                                 gss01a.ebi.ac.uk,gss01b.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss01b                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       36%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       61%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       68%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       70%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s07                 2        DA1             108 GiB  ok                
 e1d1s08                 2        DA2             110 GiB  ok                
 e1d1s09                 2        DA3             110 GiB  ok                
 e1d1s10                 2        DA1             108 GiB  ok                
 e1d1s11                 2        DA2             110 GiB  ok                
 e1d1s12                 2        DA3             110 GiB  ok                
 e1d2s07                 2        DA1             108 GiB  ok                
 e1d2s08                 2        DA2             110 GiB  ok                
 e1d2s09                 2        DA3             110 GiB  ok                
 e1d2s10                 2        DA1             108 GiB  ok                
 e1d2s11                 2        DA2             110 GiB  ok                
 e1d2s12                 2        DA3             110 GiB  ok                
 e1d3s07                 2        DA1             108 GiB  ok                
 e1d3s08                 2        DA2             110 GiB  ok                
 e1d3s09                 2        DA3             110 GiB  ok                
 e1d3s10                 2        DA1             108 GiB  ok                
 e1d3s11                 2        DA2             110 GiB  ok                
 e1d3s12                 2        DA3             110 GiB  ok                
 e1d4s07                 2        DA1             108 GiB  ok                
 e1d4s08                 2        DA2             110 GiB  ok                
 e1d4s09                 2        DA3             110 GiB  ok                
 e1d4s10                 2        DA1             108 GiB  ok                
 e1d4s11                 2        DA2             110 GiB  ok                
 e1d4s12                 2        DA3             110 GiB  ok                
 e1d5s07                 2        DA1             108 GiB  ok                
 e1d5s08                 2        DA2             110 GiB  ok                
 e1d5s09                 2        DA3             110 GiB  ok                
 e1d5s10                 2        DA3             110 GiB  ok                
 e1d5s11                 2        DA2             110 GiB  ok                
 e1d5s12log              2        LOG             186 GiB  ok                
 e2d1s07                 2        DA1             106 GiB  ok                
 e2d1s08                 2        DA2             110 GiB  ok                
 e2d1s09                 2        DA3             110 GiB  ok                
 e2d1s10                 2        DA1             108 GiB  ok                
 e2d1s11                 2        DA2             110 GiB  ok                
 e2d1s12                 2        DA3             110 GiB  ok                
 e2d2s07                 2        DA1             108 GiB  ok                
 e2d2s08                 2        DA2             110 GiB  ok                
 e2d2s09                 2        DA3             110 GiB  ok                
 e2d2s10                 2        DA1             108 GiB  ok                
 e2d2s11                 2        DA2             110 GiB  ok                
 e2d2s12                 2        DA3             110 GiB  ok                
 e2d3s07                 2        DA1             108 GiB  ok                
 e2d3s08                 2        DA2             110 GiB  ok                
 e2d3s09                 2        DA3             110 GiB  ok                
 e2d3s10                 2        DA1             108 GiB  ok                
 e2d3s11                 2        DA2             110 GiB  ok                
 e2d3s12                 2        DA3             110 GiB  ok                
 e2d4s07                 2        DA1             108 GiB  ok                
 e2d4s08                 2        DA2             110 GiB  ok                
 e2d4s09                 2        DA3             110 GiB  ok                
 e2d4s10                 2        DA1             108 GiB  ok                
 e2d4s11                 2        DA2             110 GiB  ok                
 e2d4s12                 2        DA3             110 GiB  ok                
 e2d5s07                 2        DA1             108 GiB  ok                
 e2d5s08                 2        DA2             110 GiB  ok                
 e2d5s09                 2        DA3             110 GiB  ok                
 e2d5s10                 2        DA3             110 GiB  ok                
 e2d5s11                 2        DA2             110 GiB  ok                
 e2d5s12log              2        LOG             186 GiB  ok                
 e3d1s07                 2        DA1             108 GiB  ok                
 e3d1s08                 2        DA2             110 GiB  ok                
 e3d1s09                 2        DA3             110 GiB  ok                
 e3d1s10                 2        DA1             108 GiB  ok                
 e3d1s11                 2        DA2             110 GiB  ok                
 e3d1s12                 2        DA3             110 GiB  ok                
 e3d2s07                 2        DA1             108 GiB  ok                
 e3d2s08                 2        DA2             110 GiB  ok                
 e3d2s09                 2        DA3             110 GiB  ok                
 e3d2s10                 2        DA1             108 GiB  ok                
 e3d2s11                 2        DA2             110 GiB  ok                
 e3d2s12                 2        DA3             110 GiB  ok                
 e3d3s07                 2        DA1             108 GiB  ok                
 e3d3s08                 2        DA2             110 GiB  ok                
 e3d3s09                 2        DA3             110 GiB  ok                
 e3d3s10                 2        DA1             108 GiB  ok                
 e3d3s11                 2        DA2             110 GiB  ok                
 e3d3s12                 2        DA3             110 GiB  ok                
 e3d4s07                 2        DA1             108 GiB  ok                
 e3d4s08                 2        DA2             110 GiB  ok                
 e3d4s09                 2        DA3             110 GiB  ok                
 e3d4s10                 2        DA1             108 GiB  ok                
 e3d4s11                 2        DA2             110 GiB  ok                
 e3d4s12                 2        DA3             110 GiB  ok                
 e3d5s07                 2        DA1             108 GiB  ok                
 e3d5s08                 2        DA2             110 GiB  ok                
 e3d5s09                 2        DA3             110 GiB  ok                
 e3d5s10                 2        DA1             108 GiB  ok                
 e3d5s11                 2        DA3             110 GiB  ok                
 e3d5s12log              2        LOG             186 GiB  ok                
 e4d1s07                 2        DA1             108 GiB  ok                
 e4d1s08                 2        DA2             110 GiB  ok                
 e4d1s09                 2        DA3             110 GiB  ok                
 e4d1s10                 2        DA1             108 GiB  ok                
 e4d1s11                 2        DA2             110 GiB  ok                
 e4d1s12                 2        DA3             110 GiB  ok                
 e4d2s07                 2        DA1             108 GiB  ok                
 e4d2s08                 2        DA2             110 GiB  ok                
 e4d2s09                 2        DA3             110 GiB  ok                
 e4d2s10                 2        DA1             108 GiB  ok                
 e4d2s11                 2        DA2             110 GiB  ok                
 e4d2s12                 2        DA3             110 GiB  ok                
 e4d3s07                 2        DA1             106 GiB  ok                
 e4d3s08                 2        DA2             110 GiB  ok                
 e4d3s09                 2        DA3             110 GiB  ok                
 e4d3s10                 2        DA1             106 GiB  ok                
 e4d3s11                 2        DA2             110 GiB  ok                
 e4d3s12                 2        DA3             110 GiB  ok                
 e4d4s07                 2        DA1             106 GiB  ok                
 e4d4s08                 2        DA2             110 GiB  ok                
 e4d4s09                 2        DA3             110 GiB  ok                
 e4d4s10                 2        DA1             106 GiB  ok                
 e4d4s11                 2        DA2             110 GiB  ok                
 e4d4s12                 2        DA3             110 GiB  ok                
 e4d5s07                 2        DA1             106 GiB  ok                
 e4d5s08                 2        DA2             110 GiB  ok                
 e4d5s09                 2        DA3             110 GiB  ok                
 e4d5s10                 2        DA1             106 GiB  ok                
 e4d5s11                 2        DA3             110 GiB  ok                
 e5d1s07                 2        DA1             106 GiB  ok                
 e5d1s08                 2        DA2             110 GiB  ok                
 e5d1s09                 2        DA3             110 GiB  ok                
 e5d1s10                 2        DA1             106 GiB  ok                
 e5d1s11                 2        DA2             110 GiB  ok                
 e5d1s12                 2        DA3             110 GiB  ok                
 e5d2s07                 2        DA1             106 GiB  ok                
 e5d2s08                 2        DA2             110 GiB  ok                
 e5d2s09                 2        DA3             110 GiB  ok                
 e5d2s10                 2        DA1             106 GiB  ok                
 e5d2s11                 2        DA2             110 GiB  ok                
 e5d2s12                 2        DA3             110 GiB  ok                
 e5d3s07                 2        DA1             106 GiB  ok                
 e5d3s08                 2        DA2             110 GiB  ok                
 e5d3s09                 2        DA3             110 GiB  ok                
 e5d3s10                 2        DA1             106 GiB  ok                
 e5d3s11                 2        DA2             110 GiB  ok                
 e5d3s12                 2        DA3             108 GiB  ok                
 e5d4s07                 2        DA1             106 GiB  ok                
 e5d4s08                 2        DA2             110 GiB  ok                
 e5d4s09                 2        DA3             110 GiB  ok                
 e5d4s10                 2        DA1             106 GiB  ok                
 e5d4s11                 2        DA2             110 GiB  ok                
 e5d4s12                 2        DA3             110 GiB  ok                
 e5d5s07                 2        DA1             106 GiB  ok                
 e5d5s08                 2        DA2             110 GiB  ok                
 e5d5s09                 2        DA3             110 GiB  ok                
 e5d5s10                 2        DA1             106 GiB  ok                
 e5d5s11                 2        DA2             110 GiB  ok                
 e6d1s07                 2        DA1             106 GiB  ok                
 e6d1s08                 2        DA2             110 GiB  ok                
 e6d1s09                 2        DA3             110 GiB  ok                
 e6d1s10                 2        DA1             106 GiB  ok                
 e6d1s11                 2        DA2             110 GiB  ok                
 e6d1s12                 2        DA3             110 GiB  ok                
 e6d2s07                 2        DA1             106 GiB  ok                
 e6d2s08                 2        DA2             110 GiB  ok                
 e6d2s09                 2        DA3             110 GiB  ok                
 e6d2s10                 2        DA1             106 GiB  ok                
 e6d2s11                 2        DA2             110 GiB  ok                
 e6d2s12                 2        DA3             110 GiB  ok                
 e6d3s07                 2        DA1             106 GiB  ok                
 e6d3s08                 2        DA2             108 GiB  ok                
 e6d3s09                 2        DA3             110 GiB  ok                
 e6d3s10                 2        DA1             106 GiB  ok                
 e6d3s11                 2        DA2             108 GiB  ok                
 e6d3s12                 2        DA3             108 GiB  ok                
 e6d4s07                 2        DA1             106 GiB  ok                
 e6d4s08                 2        DA2             108 GiB  ok                
 e6d4s09                 2        DA3             108 GiB  ok                
 e6d4s10                 2        DA1             106 GiB  ok                
 e6d4s11                 2        DA2             108 GiB  ok                
 e6d4s12                 2        DA3             108 GiB  ok                
 e6d5s07                 2        DA1             106 GiB  ok                
 e6d5s08                 2        DA2             110 GiB  ok                
 e6d5s09                 2        DA3             108 GiB  ok                
 e6d5s10                 2        DA1             106 GiB  ok                
 e6d5s11                 2        DA2             108 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss01b_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss01b_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss01b_MetaData_8M_3p_1  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss01b_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss01b_MetaData_8M_3p_3  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss01b_Data_8M_3p_1  8+3p                DA1              99 TiB      8 MiB     32 KiB             
 gss01b_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss01b_Data_8M_3p_3  8+3p                DA3              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss01b.ebi.ac.uk                                 gss01b.ebi.ac.uk,gss01a.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss02a                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       41%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub        8%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       14%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub        5%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s01                 2        DA3             110 GiB  ok                
 e1d1s02                 2        DA2             110 GiB  ok                
 e1d1s03log              2        LOG             186 GiB  ok                
 e1d1s04                 2        DA1             108 GiB  ok                
 e1d1s05                 2        DA2             110 GiB  ok                
 e1d1s06                 2        DA3             110 GiB  ok                
 e1d2s01                 2        DA1             108 GiB  ok                
 e1d2s02                 2        DA2             110 GiB  ok                
 e1d2s03                 2        DA3             110 GiB  ok                
 e1d2s04                 2        DA1             108 GiB  ok                
 e1d2s05                 2        DA2             110 GiB  ok                
 e1d2s06                 2        DA3             110 GiB  ok                
 e1d3s01                 2        DA1             108 GiB  ok                
 e1d3s02                 2        DA2             110 GiB  ok                
 e1d3s03                 2        DA3             110 GiB  ok                
 e1d3s04                 2        DA1             108 GiB  ok                
 e1d3s05                 2        DA2             110 GiB  ok                
 e1d3s06                 2        DA3             110 GiB  ok                
 e1d4s01                 2        DA1             108 GiB  ok                
 e1d4s02                 2        DA2             110 GiB  ok                
 e1d4s03                 2        DA3             110 GiB  ok                
 e1d4s04                 2        DA1             108 GiB  ok                
 e1d4s05                 2        DA2             110 GiB  ok                
 e1d4s06                 2        DA3             110 GiB  ok                
 e1d5s01                 2        DA1             108 GiB  ok                
 e1d5s02                 2        DA2             110 GiB  ok                
 e1d5s03                 2        DA3             110 GiB  ok                
 e1d5s04                 2        DA1             108 GiB  ok                
 e1d5s05                 2        DA2             110 GiB  ok                
 e1d5s06                 2        DA3             110 GiB  ok                
 e2d1s01                 2        DA3             110 GiB  ok                
 e2d1s02                 2        DA2             110 GiB  ok                
 e2d1s03log              2        LOG             186 GiB  ok                
 e2d1s04                 2        DA1             108 GiB  ok                
 e2d1s05                 2        DA2             110 GiB  ok                
 e2d1s06                 2        DA3             110 GiB  ok                
 e2d2s01                 2        DA1             106 GiB  ok                
 e2d2s02                 2        DA2             110 GiB  ok                
 e2d2s03                 2        DA3             110 GiB  ok                
 e2d2s04                 2        DA1             106 GiB  ok                
 e2d2s05                 2        DA2             110 GiB  ok                
 e2d2s06                 2        DA3             110 GiB  ok                
 e2d3s01                 2        DA1             106 GiB  ok                
 e2d3s02                 2        DA2             110 GiB  ok                
 e2d3s03                 2        DA3             110 GiB  ok                
 e2d3s04                 2        DA1             106 GiB  ok                
 e2d3s05                 2        DA2             110 GiB  ok                
 e2d3s06                 2        DA3             110 GiB  ok                
 e2d4s01                 2        DA1             106 GiB  ok                
 e2d4s02                 2        DA2             110 GiB  ok                
 e2d4s03                 2        DA3             110 GiB  ok                
 e2d4s04                 2        DA1             106 GiB  ok                
 e2d4s05                 2        DA2             110 GiB  ok                
 e2d4s06                 2        DA3             110 GiB  ok                
 e2d5s01                 2        DA1             108 GiB  ok                
 e2d5s02                 2        DA2             110 GiB  ok                
 e2d5s03                 2        DA3             110 GiB  ok                
 e2d5s04                 2        DA1             108 GiB  ok                
 e2d5s05                 2        DA2             110 GiB  ok                
 e2d5s06                 2        DA3             110 GiB  ok                
 e3d1s01                 2        DA1             108 GiB  ok                
 e3d1s02                 2        DA3             110 GiB  ok                
 e3d1s03log              2        LOG             186 GiB  ok                
 e3d1s04                 2        DA1             106 GiB  ok                
 e3d1s05                 2        DA2             110 GiB  ok                
 e3d1s06                 2        DA3             110 GiB  ok                
 e3d2s01                 2        DA1             106 GiB  ok                
 e3d2s02                 2        DA2             110 GiB  ok                
 e3d2s03                 2        DA3             110 GiB  ok                
 e3d2s04                 2        DA1             108 GiB  ok                
 e3d2s05                 2        DA2             110 GiB  ok                
 e3d2s06                 2        DA3             110 GiB  ok                
 e3d3s01                 2        DA1             106 GiB  ok                
 e3d3s02                 2        DA2             110 GiB  ok                
 e3d3s03                 2        DA3             110 GiB  ok                
 e3d3s04                 2        DA1             106 GiB  ok                
 e3d3s05                 2        DA2             110 GiB  ok                
 e3d3s06                 2        DA3             110 GiB  ok                
 e3d4s01                 2        DA1             106 GiB  ok                
 e3d4s02                 2        DA2             110 GiB  ok                
 e3d4s03                 2        DA3             110 GiB  ok                
 e3d4s04                 2        DA1             108 GiB  ok                
 e3d4s05                 2        DA2             110 GiB  ok                
 e3d4s06                 2        DA3             110 GiB  ok                
 e3d5s01                 2        DA1             108 GiB  ok                
 e3d5s02                 2        DA2             110 GiB  ok                
 e3d5s03                 2        DA3             110 GiB  ok                
 e3d5s04                 2        DA1             106 GiB  ok                
 e3d5s05                 2        DA2             110 GiB  ok                
 e3d5s06                 2        DA3             110 GiB  ok                
 e4d1s01                 2        DA1             106 GiB  ok                
 e4d1s02                 2        DA3             110 GiB  ok                
 e4d1s04                 2        DA1             106 GiB  ok                
 e4d1s05                 2        DA2             110 GiB  ok                
 e4d1s06                 2        DA3             110 GiB  ok                
 e4d2s01                 2        DA1             106 GiB  ok                
 e4d2s02                 2        DA2             110 GiB  ok                
 e4d2s03                 2        DA3             110 GiB  ok                
 e4d2s04                 2        DA1             106 GiB  ok                
 e4d2s05                 2        DA2             110 GiB  ok                
 e4d2s06                 2        DA3             110 GiB  ok                
 e4d3s01                 2        DA1             108 GiB  ok                
 e4d3s02                 2        DA2             110 GiB  ok                
 e4d3s03                 2        DA3             110 GiB  ok                
 e4d3s04                 2        DA1             108 GiB  ok                
 e4d3s05                 2        DA2             110 GiB  ok                
 e4d3s06                 2        DA3             110 GiB  ok                
 e4d4s01                 2        DA1             106 GiB  ok                
 e4d4s02                 2        DA2             110 GiB  ok                
 e4d4s03                 2        DA3             110 GiB  ok                
 e4d4s04                 2        DA1             106 GiB  ok                
 e4d4s05                 2        DA2             110 GiB  ok                
 e4d4s06                 2        DA3             110 GiB  ok                
 e4d5s01                 2        DA1             106 GiB  ok                
 e4d5s02                 2        DA2             110 GiB  ok                
 e4d5s03                 2        DA3             110 GiB  ok                
 e4d5s04                 2        DA1             106 GiB  ok                
 e4d5s05                 2        DA2             110 GiB  ok                
 e4d5s06                 2        DA3             110 GiB  ok                
 e5d1s01                 2        DA1             108 GiB  ok                
 e5d1s02                 2        DA2             110 GiB  ok                
 e5d1s04                 2        DA1             106 GiB  ok                
 e5d1s05                 2        DA2             110 GiB  ok                
 e5d1s06                 2        DA3             110 GiB  ok                
 e5d2s01                 2        DA1             108 GiB  ok                
 e5d2s02                 2        DA2             110 GiB  ok                
 e5d2s03                 2        DA3             110 GiB  ok                
 e5d2s04                 2        DA1             108 GiB  ok                
 e5d2s05                 2        DA2             110 GiB  ok                
 e5d2s06                 2        DA3             110 GiB  ok                
 e5d3s01                 2        DA1             108 GiB  ok                
 e5d3s02                 2        DA2             110 GiB  ok                
 e5d3s03                 2        DA3             110 GiB  ok                
 e5d3s04                 2        DA1             106 GiB  ok                
 e5d3s05                 2        DA2             110 GiB  ok                
 e5d3s06                 2        DA3             110 GiB  ok                
 e5d4s01                 2        DA1             108 GiB  ok                
 e5d4s02                 2        DA2             110 GiB  ok                
 e5d4s03                 2        DA3             110 GiB  ok                
 e5d4s04                 2        DA1             108 GiB  ok                
 e5d4s05                 2        DA2             110 GiB  ok                
 e5d4s06                 2        DA3             110 GiB  ok                
 e5d5s01                 2        DA1             108 GiB  ok                
 e5d5s02                 2        DA2             110 GiB  ok                
 e5d5s03                 2        DA3             110 GiB  ok                
 e5d5s04                 2        DA1             106 GiB  ok                
 e5d5s05                 2        DA2             110 GiB  ok                
 e5d5s06                 2        DA3             110 GiB  ok                
 e6d1s01                 2        DA1             108 GiB  ok                
 e6d1s02                 2        DA2             110 GiB  ok                
 e6d1s04                 2        DA1             108 GiB  ok                
 e6d1s05                 2        DA2             110 GiB  ok                
 e6d1s06                 2        DA3             110 GiB  ok                
 e6d2s01                 2        DA1             106 GiB  ok                
 e6d2s02                 2        DA2             110 GiB  ok                
 e6d2s03                 2        DA3             110 GiB  ok                
 e6d2s04                 2        DA1             108 GiB  ok                
 e6d2s05                 2        DA2             108 GiB  ok                
 e6d2s06                 2        DA3             110 GiB  ok                
 e6d3s01                 2        DA1             106 GiB  ok                
 e6d3s02                 2        DA2             108 GiB  ok                
 e6d3s03                 2        DA3             110 GiB  ok                
 e6d3s04                 2        DA1             106 GiB  ok                
 e6d3s05                 2        DA2             108 GiB  ok                
 e6d3s06                 2        DA3             108 GiB  ok                
 e6d4s01                 2        DA1             106 GiB  ok                
 e6d4s02                 2        DA2             108 GiB  ok                
 e6d4s03                 2        DA3             108 GiB  ok                
 e6d4s04                 2        DA1             108 GiB  ok                
 e6d4s05                 2        DA2             108 GiB  ok                
 e6d4s06                 2        DA3             108 GiB  ok                
 e6d5s01                 2        DA1             108 GiB  ok                
 e6d5s02                 2        DA2             110 GiB  ok                
 e6d5s03                 2        DA3             108 GiB  ok                
 e6d5s04                 2        DA1             108 GiB  ok                
 e6d5s05                 2        DA2             110 GiB  ok                
 e6d5s06                 2        DA3             108 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss02a_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss02a_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss02a_MetaData_8M_3p_1  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss02a_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss02a_MetaData_8M_3p_3  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss02a_Data_8M_3p_1  8+3p                DA3              99 TiB      8 MiB     32 KiB             
 gss02a_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss02a_Data_8M_3p_3  8+3p                DA1              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss02a.ebi.ac.uk                                 gss02a.ebi.ac.uk,gss02b.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss02b                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       39%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       67%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       13%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       13%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s07                 2        DA1             108 GiB  ok                
 e1d1s08                 2        DA2             110 GiB  ok                
 e1d1s09                 2        DA3             110 GiB  ok                
 e1d1s10                 2        DA1             108 GiB  ok                
 e1d1s11                 2        DA2             110 GiB  ok                
 e1d1s12                 2        DA3             110 GiB  ok                
 e1d2s07                 2        DA1             108 GiB  ok                
 e1d2s08                 2        DA2             110 GiB  ok                
 e1d2s09                 2        DA3             110 GiB  ok                
 e1d2s10                 2        DA1             108 GiB  ok                
 e1d2s11                 2        DA2             110 GiB  ok                
 e1d2s12                 2        DA3             110 GiB  ok                
 e1d3s07                 2        DA1             108 GiB  ok                
 e1d3s08                 2        DA2             110 GiB  ok                
 e1d3s09                 2        DA3             110 GiB  ok                
 e1d3s10                 2        DA1             108 GiB  ok                
 e1d3s11                 2        DA2             110 GiB  ok                
 e1d3s12                 2        DA3             110 GiB  ok                
 e1d4s07                 2        DA1             108 GiB  ok                
 e1d4s08                 2        DA2             110 GiB  ok                
 e1d4s09                 2        DA3             110 GiB  ok                
 e1d4s10                 2        DA1             108 GiB  ok                
 e1d4s11                 2        DA2             110 GiB  ok                
 e1d4s12                 2        DA3             110 GiB  ok                
 e1d5s07                 2        DA1             108 GiB  ok                
 e1d5s08                 2        DA2             110 GiB  ok                
 e1d5s09                 2        DA3             110 GiB  ok                
 e1d5s10                 2        DA3             110 GiB  ok                
 e1d5s11                 2        DA2             110 GiB  ok                
 e1d5s12log              2        LOG             186 GiB  ok                
 e2d1s07                 2        DA1             108 GiB  ok                
 e2d1s08                 2        DA2             110 GiB  ok                
 e2d1s09                 2        DA3             110 GiB  ok                
 e2d1s10                 2        DA1             108 GiB  ok                
 e2d1s11                 2        DA2             110 GiB  ok                
 e2d1s12                 2        DA3             110 GiB  ok                
 e2d2s07                 2        DA1             108 GiB  ok                
 e2d2s08                 2        DA2             110 GiB  ok                
 e2d2s09                 2        DA3             110 GiB  ok                
 e2d2s10                 2        DA1             108 GiB  ok                
 e2d2s11                 2        DA2             110 GiB  ok                
 e2d2s12                 2        DA3             110 GiB  ok                
 e2d3s07                 2        DA1             108 GiB  ok                
 e2d3s08                 2        DA2             110 GiB  ok                
 e2d3s09                 2        DA3             110 GiB  ok                
 e2d3s10                 2        DA1             108 GiB  ok                
 e2d3s11                 2        DA2             110 GiB  ok                
 e2d3s12                 2        DA3             110 GiB  ok                
 e2d4s07                 2        DA1             108 GiB  ok                
 e2d4s08                 2        DA2             110 GiB  ok                
 e2d4s09                 2        DA3             110 GiB  ok                
 e2d4s10                 2        DA1             108 GiB  ok                
 e2d4s11                 2        DA2             110 GiB  ok                
 e2d4s12                 2        DA3             110 GiB  ok                
 e2d5s07                 2        DA1             108 GiB  ok                
 e2d5s08                 2        DA2             110 GiB  ok                
 e2d5s09                 2        DA3             110 GiB  ok                
 e2d5s10                 2        DA3             110 GiB  ok                
 e2d5s11                 2        DA2             110 GiB  ok                
 e2d5s12log              2        LOG             186 GiB  ok                
 e3d1s07                 2        DA1             108 GiB  ok                
 e3d1s08                 2        DA2             110 GiB  ok                
 e3d1s09                 2        DA3             110 GiB  ok                
 e3d1s10                 2        DA1             108 GiB  ok                
 e3d1s11                 2        DA2             110 GiB  ok                
 e3d1s12                 2        DA3             110 GiB  ok                
 e3d2s07                 2        DA1             108 GiB  ok                
 e3d2s08                 2        DA2             110 GiB  ok                
 e3d2s09                 2        DA3             110 GiB  ok                
 e3d2s10                 2        DA1             108 GiB  ok                
 e3d2s11                 2        DA2             110 GiB  ok                
 e3d2s12                 2        DA3             110 GiB  ok                
 e3d3s07                 2        DA1             108 GiB  ok                
 e3d3s08                 2        DA2             110 GiB  ok                
 e3d3s09                 2        DA3             110 GiB  ok                
 e3d3s10                 2        DA1             108 GiB  ok                
 e3d3s11                 2        DA2             110 GiB  ok                
 e3d3s12                 2        DA3             110 GiB  ok                
 e3d4s07                 2        DA1             108 GiB  ok                
 e3d4s08                 2        DA2             110 GiB  ok                
 e3d4s09                 2        DA3             110 GiB  ok                
 e3d4s10                 2        DA1             108 GiB  ok                
 e3d4s11                 2        DA2             110 GiB  ok                
 e3d4s12                 2        DA3             110 GiB  ok                
 e3d5s07                 2        DA1             108 GiB  ok                
 e3d5s08                 2        DA2             110 GiB  ok                
 e3d5s09                 2        DA3             110 GiB  ok                
 e3d5s10                 2        DA1             108 GiB  ok                
 e3d5s11                 2        DA3             110 GiB  ok                
 e3d5s12log              2        LOG             186 GiB  ok                
 e4d1s07                 2        DA1             108 GiB  ok                
 e4d1s08                 2        DA2             110 GiB  ok                
 e4d1s09                 2        DA3             110 GiB  ok                
 e4d1s10                 2        DA1             108 GiB  ok                
 e4d1s11                 2        DA2             110 GiB  ok                
 e4d1s12                 2        DA3             110 GiB  ok                
 e4d2s07                 2        DA1             106 GiB  ok                
 e4d2s08                 2        DA2             110 GiB  ok                
 e4d2s09                 2        DA3             110 GiB  ok                
 e4d2s10                 2        DA1             106 GiB  ok                
 e4d2s11                 2        DA2             110 GiB  ok                
 e4d2s12                 2        DA3             110 GiB  ok                
 e4d3s07                 2        DA1             106 GiB  ok                
 e4d3s08                 2        DA2             110 GiB  ok                
 e4d3s09                 2        DA3             110 GiB  ok                
 e4d3s10                 2        DA1             106 GiB  ok                
 e4d3s11                 2        DA2             110 GiB  ok                
 e4d3s12                 2        DA3             110 GiB  ok                
 e4d4s07                 2        DA1             106 GiB  ok                
 e4d4s08                 2        DA2             110 GiB  ok                
 e4d4s09                 2        DA3             110 GiB  ok                
 e4d4s10                 2        DA1             108 GiB  ok                
 e4d4s11                 2        DA2             110 GiB  ok                
 e4d4s12                 2        DA3             110 GiB  ok                
 e4d5s07                 2        DA1             106 GiB  ok                
 e4d5s08                 2        DA2             110 GiB  ok                
 e4d5s09                 2        DA3             110 GiB  ok                
 e4d5s10                 2        DA1             106 GiB  ok                
 e4d5s11                 2        DA3             110 GiB  ok                
 e5d1s07                 2        DA1             106 GiB  ok                
 e5d1s08                 2        DA2             110 GiB  ok                
 e5d1s09                 2        DA3             110 GiB  ok                
 e5d1s10                 2        DA1             106 GiB  ok                
 e5d1s11                 2        DA2             110 GiB  ok                
 e5d1s12                 2        DA3             110 GiB  ok                
 e5d2s07                 2        DA1             106 GiB  ok                
 e5d2s08                 2        DA2             110 GiB  ok                
 e5d2s09                 2        DA3             110 GiB  ok                
 e5d2s10                 2        DA1             106 GiB  ok                
 e5d2s11                 2        DA2             110 GiB  ok                
 e5d2s12                 2        DA3             110 GiB  ok                
 e5d3s07                 2        DA1             106 GiB  ok                
 e5d3s08                 2        DA2             110 GiB  ok                
 e5d3s09                 2        DA3             110 GiB  ok                
 e5d3s10                 2        DA1             106 GiB  ok                
 e5d3s11                 2        DA2             110 GiB  ok                
 e5d3s12                 2        DA3             110 GiB  ok                
 e5d4s07                 2        DA1             106 GiB  ok                
 e5d4s08                 2        DA2             110 GiB  ok                
 e5d4s09                 2        DA3             110 GiB  ok                
 e5d4s10                 2        DA1             106 GiB  ok                
 e5d4s11                 2        DA2             110 GiB  ok                
 e5d4s12                 2        DA3             110 GiB  ok                
 e5d5s07                 2        DA1             106 GiB  ok                
 e5d5s08                 2        DA2             110 GiB  ok                
 e5d5s09                 2        DA3             110 GiB  ok                
 e5d5s10                 2        DA1             106 GiB  ok                
 e5d5s11                 2        DA2             110 GiB  ok                
 e6d1s07                 2        DA1             106 GiB  ok                
 e6d1s08                 2        DA2             110 GiB  ok                
 e6d1s09                 2        DA3             110 GiB  ok                
 e6d1s10                 2        DA1             106 GiB  ok                
 e6d1s11                 2        DA2             110 GiB  ok                
 e6d1s12                 2        DA3             110 GiB  ok                
 e6d2s07                 2        DA1             106 GiB  ok                
 e6d2s08                 2        DA2             110 GiB  ok                
 e6d2s09                 2        DA3             108 GiB  ok                
 e6d2s10                 2        DA1             106 GiB  ok                
 e6d2s11                 2        DA2             108 GiB  ok                
 e6d2s12                 2        DA3             108 GiB  ok                
 e6d3s07                 2        DA1             106 GiB  ok                
 e6d3s08                 2        DA2             108 GiB  ok                
 e6d3s09                 2        DA3             108 GiB  ok                
 e6d3s10                 2        DA1             106 GiB  ok                
 e6d3s11                 2        DA2             108 GiB  ok                
 e6d3s12                 2        DA3             108 GiB  ok                
 e6d4s07                 2        DA1             106 GiB  ok                
 e6d4s08                 2        DA2             108 GiB  ok                
 e6d4s09                 2        DA3             108 GiB  ok                
 e6d4s10                 2        DA1             106 GiB  ok                
 e6d4s11                 2        DA2             108 GiB  ok                
 e6d4s12                 2        DA3             110 GiB  ok                
 e6d5s07                 2        DA1             106 GiB  ok                
 e6d5s08                 2        DA2             110 GiB  ok                
 e6d5s09                 2        DA3             110 GiB  ok                
 e6d5s10                 2        DA1             106 GiB  ok                
 e6d5s11                 2        DA2             110 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss02b_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss02b_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss02b_MetaData_8M_3p_1  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss02b_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss02b_MetaData_8M_3p_3  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss02b_Data_8M_3p_1  8+3p                DA1              99 TiB      8 MiB     32 KiB             
 gss02b_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss02b_Data_8M_3p_3  8+3p                DA3              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss02b.ebi.ac.uk                                 gss02b.ebi.ac.uk,gss02a.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss03a                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       36%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       18%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       19%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub        4%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s01                 2        DA3             110 GiB  ok                
 e1d1s02                 2        DA2             110 GiB  ok                
 e1d1s03log              2        LOG             186 GiB  ok                
 e1d1s04                 2        DA1             108 GiB  ok                
 e1d1s05                 2        DA2             110 GiB  ok                
 e1d1s06                 2        DA3             110 GiB  ok                
 e1d2s01                 2        DA1             108 GiB  ok                
 e1d2s02                 2        DA2             110 GiB  ok                
 e1d2s03                 2        DA3             110 GiB  ok                
 e1d2s04                 2        DA1             108 GiB  ok                
 e1d2s05                 2        DA2             110 GiB  ok                
 e1d2s06                 2        DA3             110 GiB  ok                
 e1d3s01                 2        DA1             108 GiB  ok                
 e1d3s02                 2        DA2             110 GiB  ok                
 e1d3s03                 2        DA3             110 GiB  ok                
 e1d3s04                 2        DA1             108 GiB  ok                
 e1d3s05                 2        DA2             110 GiB  ok                
 e1d3s06                 2        DA3             110 GiB  ok                
 e1d4s01                 2        DA1             108 GiB  ok                
 e1d4s02                 2        DA2             110 GiB  ok                
 e1d4s03                 2        DA3             110 GiB  ok                
 e1d4s04                 2        DA1             108 GiB  ok                
 e1d4s05                 2        DA2             110 GiB  ok                
 e1d4s06                 2        DA3             110 GiB  ok                
 e1d5s01                 2        DA1             108 GiB  ok                
 e1d5s02                 2        DA2             110 GiB  ok                
 e1d5s03                 2        DA3             110 GiB  ok                
 e1d5s04                 2        DA1             108 GiB  ok                
 e1d5s05                 2        DA2             110 GiB  ok                
 e1d5s06                 2        DA3             110 GiB  ok                
 e2d1s01                 2        DA3             110 GiB  ok                
 e2d1s02                 2        DA2             110 GiB  ok                
 e2d1s03log              2        LOG             186 GiB  ok                
 e2d1s04                 2        DA1             108 GiB  ok                
 e2d1s05                 2        DA2             110 GiB  ok                
 e2d1s06                 2        DA3             110 GiB  ok                
 e2d2s01                 2        DA1             108 GiB  ok                
 e2d2s02                 2        DA2             110 GiB  ok                
 e2d2s03                 2        DA3             110 GiB  ok                
 e2d2s04                 2        DA1             108 GiB  ok                
 e2d2s05                 2        DA2             110 GiB  ok                
 e2d2s06                 2        DA3             110 GiB  ok                
 e2d3s01                 2        DA1             108 GiB  ok                
 e2d3s02                 2        DA2             110 GiB  ok                
 e2d3s03                 2        DA3             110 GiB  ok                
 e2d3s04                 2        DA1             108 GiB  ok                
 e2d3s05                 2        DA2             110 GiB  ok                
 e2d3s06                 2        DA3             110 GiB  ok                
 e2d4s01                 2        DA1             108 GiB  ok                
 e2d4s02                 2        DA2             110 GiB  ok                
 e2d4s03                 2        DA3             110 GiB  ok                
 e2d4s04                 2        DA1             108 GiB  ok                
 e2d4s05                 2        DA2             110 GiB  ok                
 e2d4s06                 2        DA3             110 GiB  ok                
 e2d5s01                 2        DA1             108 GiB  ok                
 e2d5s02                 2        DA2             110 GiB  ok                
 e2d5s03                 2        DA3             110 GiB  ok                
 e2d5s04                 2        DA1             108 GiB  ok                
 e2d5s05                 2        DA2             110 GiB  ok                
 e2d5s06                 2        DA3             110 GiB  ok                
 e3d1s01                 2        DA1             108 GiB  ok                
 e3d1s02                 2        DA3             110 GiB  ok                
 e3d1s03log              2        LOG             186 GiB  ok                
 e3d1s04                 2        DA1             108 GiB  ok                
 e3d1s05                 2        DA2             110 GiB  ok                
 e3d1s06                 2        DA3             110 GiB  ok                
 e3d2s01                 2        DA1             108 GiB  ok                
 e3d2s02                 2        DA2             110 GiB  ok                
 e3d2s03                 2        DA3             110 GiB  ok                
 e3d2s04                 2        DA1             108 GiB  ok                
 e3d2s05                 2        DA2             110 GiB  ok                
 e3d2s06                 2        DA3             110 GiB  ok                
 e3d3s01                 2        DA1             108 GiB  ok                
 e3d3s02                 2        DA2             110 GiB  ok                
 e3d3s03                 2        DA3             110 GiB  ok                
 e3d3s04                 2        DA1             108 GiB  ok                
 e3d3s05                 2        DA2             110 GiB  ok                
 e3d3s06                 2        DA3             110 GiB  ok                
 e3d4s01                 2        DA1             108 GiB  ok                
 e3d4s02                 2        DA2             110 GiB  ok                
 e3d4s03                 2        DA3             110 GiB  ok                
 e3d4s04                 2        DA1             108 GiB  ok                
 e3d4s05                 2        DA2             110 GiB  ok                
 e3d4s06                 2        DA3             110 GiB  ok                
 e3d5s01                 2        DA1             108 GiB  ok                
 e3d5s02                 2        DA2             110 GiB  ok                
 e3d5s03                 2        DA3             110 GiB  ok                
 e3d5s04                 2        DA1             108 GiB  ok                
 e3d5s05                 2        DA2             110 GiB  ok                
 e3d5s06                 2        DA3             110 GiB  ok                
 e4d1s01                 2        DA1             108 GiB  ok                
 e4d1s02                 2        DA3             110 GiB  ok                
 e4d1s04                 2        DA1             108 GiB  ok                
 e4d1s05                 2        DA2             110 GiB  ok                
 e4d1s06                 2        DA3             110 GiB  ok                
 e4d2s01                 2        DA1             108 GiB  ok                
 e4d2s02                 2        DA2             110 GiB  ok                
 e4d2s03                 2        DA3             110 GiB  ok                
 e4d2s04                 2        DA1             106 GiB  ok                
 e4d2s05                 2        DA2             110 GiB  ok                
 e4d2s06                 2        DA3             110 GiB  ok                
 e4d3s01                 2        DA1             106 GiB  ok                
 e4d3s02                 2        DA2             110 GiB  ok                
 e4d3s03                 2        DA3             110 GiB  ok                
 e4d3s04                 2        DA1             106 GiB  ok                
 e4d3s05                 2        DA2             110 GiB  ok                
 e4d3s06                 2        DA3             110 GiB  ok                
 e4d4s01                 2        DA1             106 GiB  ok                
 e4d4s02                 2        DA2             110 GiB  ok                
 e4d4s03                 2        DA3             110 GiB  ok                
 e4d4s04                 2        DA1             106 GiB  ok                
 e4d4s05                 2        DA2             110 GiB  ok                
 e4d4s06                 2        DA3             110 GiB  ok                
 e4d5s01                 2        DA1             106 GiB  ok                
 e4d5s02                 2        DA2             110 GiB  ok                
 e4d5s03                 2        DA3             110 GiB  ok                
 e4d5s04                 2        DA1             106 GiB  ok                
 e4d5s05                 2        DA2             110 GiB  ok                
 e4d5s06                 2        DA3             110 GiB  ok                
 e5d1s01                 2        DA1             106 GiB  ok                
 e5d1s02                 2        DA2             110 GiB  ok                
 e5d1s04                 2        DA1             106 GiB  ok                
 e5d1s05                 2        DA2             110 GiB  ok                
 e5d1s06                 2        DA3             110 GiB  ok                
 e5d2s01                 2        DA1             106 GiB  ok                
 e5d2s02                 2        DA2             110 GiB  ok                
 e5d2s03                 2        DA3             110 GiB  ok                
 e5d2s04                 2        DA1             106 GiB  ok                
 e5d2s05                 2        DA2             110 GiB  ok                
 e5d2s06                 2        DA3             110 GiB  ok                
 e5d3s01                 2        DA1             106 GiB  ok                
 e5d3s02                 2        DA2             110 GiB  ok                
 e5d3s03                 2        DA3             110 GiB  ok                
 e5d3s04                 2        DA1             106 GiB  ok                
 e5d3s05                 2        DA2             110 GiB  ok                
 e5d3s06                 2        DA3             110 GiB  ok                
 e5d4s01                 2        DA1             106 GiB  ok                
 e5d4s02                 2        DA2             110 GiB  ok                
 e5d4s03                 2        DA3             110 GiB  ok                
 e5d4s04                 2        DA1             106 GiB  ok                
 e5d4s05                 2        DA2             110 GiB  ok                
 e5d4s06                 2        DA3             110 GiB  ok                
 e5d5s01                 2        DA1             106 GiB  ok                
 e5d5s02                 2        DA2             110 GiB  ok                
 e5d5s03                 2        DA3             110 GiB  ok                
 e5d5s04                 2        DA1             106 GiB  ok                
 e5d5s05                 2        DA2             110 GiB  ok                
 e5d5s06                 2        DA3             110 GiB  ok                
 e6d1s01                 2        DA1             106 GiB  ok                
 e6d1s02                 2        DA2             110 GiB  ok                
 e6d1s04                 2        DA1             106 GiB  ok                
 e6d1s05                 2        DA2             110 GiB  ok                
 e6d1s06                 2        DA3             110 GiB  ok                
 e6d2s01                 2        DA1             106 GiB  ok                
 e6d2s02                 2        DA2             110 GiB  ok                
 e6d2s03                 2        DA3             110 GiB  ok                
 e6d2s04                 2        DA1             106 GiB  ok                
 e6d2s05                 2        DA2             108 GiB  ok                
 e6d2s06                 2        DA3             108 GiB  ok                
 e6d3s01                 2        DA1             106 GiB  ok                
 e6d3s02                 2        DA2             108 GiB  ok                
 e6d3s03                 2        DA3             108 GiB  ok                
 e6d3s04                 2        DA1             106 GiB  ok                
 e6d3s05                 2        DA2             108 GiB  ok                
 e6d3s06                 2        DA3             108 GiB  ok                
 e6d4s01                 2        DA1             106 GiB  ok                
 e6d4s02                 2        DA2             108 GiB  ok                
 e6d4s03                 2        DA3             108 GiB  ok                
 e6d4s04                 2        DA1             106 GiB  ok                
 e6d4s05                 2        DA2             108 GiB  ok                
 e6d4s06                 2        DA3             108 GiB  ok                
 e6d5s01                 2        DA1             106 GiB  ok                
 e6d5s02                 2        DA2             110 GiB  ok                
 e6d5s03                 2        DA3             110 GiB  ok                
 e6d5s04                 2        DA1             106 GiB  ok                
 e6d5s05                 2        DA2             110 GiB  ok                
 e6d5s06                 2        DA3             110 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss03a_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss03a_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss03a_MetaData_8M_3p_1  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss03a_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss03a_MetaData_8M_3p_3  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss03a_Data_8M_3p_1  8+3p                DA3              99 TiB      8 MiB     32 KiB             
 gss03a_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss03a_Data_8M_3p_3  8+3p                DA1              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss03a.ebi.ac.uk                                 gss03a.ebi.ac.uk,gss03b.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss03b                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       38%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       12%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       20%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       19%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s07                 2        DA1             108 GiB  ok                
 e1d1s08                 2        DA2             110 GiB  ok                
 e1d1s09                 2        DA3             110 GiB  ok                
 e1d1s10                 2        DA1             108 GiB  ok                
 e1d1s11                 2        DA2             110 GiB  ok                
 e1d1s12                 2        DA3             110 GiB  ok                
 e1d2s07                 2        DA1             108 GiB  ok                
 e1d2s08                 2        DA2             110 GiB  ok                
 e1d2s09                 2        DA3             110 GiB  ok                
 e1d2s10                 2        DA1             108 GiB  ok                
 e1d2s11                 2        DA2             110 GiB  ok                
 e1d2s12                 2        DA3             110 GiB  ok                
 e1d3s07                 2        DA1             108 GiB  ok                
 e1d3s08                 2        DA2             110 GiB  ok                
 e1d3s09                 2        DA3             110 GiB  ok                
 e1d3s10                 2        DA1             108 GiB  ok                
 e1d3s11                 2        DA2             110 GiB  ok                
 e1d3s12                 2        DA3             110 GiB  ok                
 e1d4s07                 2        DA1             108 GiB  ok                
 e1d4s08                 2        DA2             110 GiB  ok                
 e1d4s09                 2        DA3             110 GiB  ok                
 e1d4s10                 2        DA1             108 GiB  ok                
 e1d4s11                 2        DA2             110 GiB  ok                
 e1d4s12                 2        DA3             110 GiB  ok                
 e1d5s07                 2        DA1             108 GiB  ok                
 e1d5s08                 2        DA2             110 GiB  ok                
 e1d5s09                 2        DA3             110 GiB  ok                
 e1d5s10                 2        DA3             110 GiB  ok                
 e1d5s11                 2        DA2             110 GiB  ok                
 e1d5s12log              2        LOG             186 GiB  ok                
 e2d1s07                 2        DA1             108 GiB  ok                
 e2d1s08                 2        DA2             110 GiB  ok                
 e2d1s09                 2        DA3             110 GiB  ok                
 e2d1s10                 2        DA1             106 GiB  ok                
 e2d1s11                 2        DA2             110 GiB  ok                
 e2d1s12                 2        DA3             110 GiB  ok                
 e2d2s07                 2        DA1             106 GiB  ok                
 e2d2s08                 2        DA2             110 GiB  ok                
 e2d2s09                 2        DA3             110 GiB  ok                
 e2d2s10                 2        DA1             106 GiB  ok                
 e2d2s11                 2        DA2             110 GiB  ok                
 e2d2s12                 2        DA3             110 GiB  ok                
 e2d3s07                 2        DA1             106 GiB  ok                
 e2d3s08                 2        DA2             110 GiB  ok                
 e2d3s09                 2        DA3             110 GiB  ok                
 e2d3s10                 2        DA1             106 GiB  ok                
 e2d3s11                 2        DA2             110 GiB  ok                
 e2d3s12                 2        DA3             110 GiB  ok                
 e2d4s07                 2        DA1             106 GiB  ok                
 e2d4s08                 2        DA2             110 GiB  ok                
 e2d4s09                 2        DA3             110 GiB  ok                
 e2d4s10                 2        DA1             108 GiB  ok                
 e2d4s11                 2        DA2             110 GiB  ok                
 e2d4s12                 2        DA3             110 GiB  ok                
 e2d5s07                 2        DA1             108 GiB  ok                
 e2d5s08                 2        DA2             110 GiB  ok                
 e2d5s09                 2        DA3             110 GiB  ok                
 e2d5s10                 2        DA3             110 GiB  ok                
 e2d5s11                 2        DA2             110 GiB  ok                
 e2d5s12log              2        LOG             186 GiB  ok                
 e3d1s07                 2        DA1             108 GiB  ok                
 e3d1s08                 2        DA2             110 GiB  ok                
 e3d1s09                 2        DA3             110 GiB  ok                
 e3d1s10                 2        DA1             106 GiB  ok                
 e3d1s11                 2        DA2             110 GiB  ok                
 e3d1s12                 2        DA3             110 GiB  ok                
 e3d2s07                 2        DA1             106 GiB  ok                
 e3d2s08                 2        DA2             110 GiB  ok                
 e3d2s09                 2        DA3             110 GiB  ok                
 e3d2s10                 2        DA1             108 GiB  ok                
 e3d2s11                 2        DA2             110 GiB  ok                
 e3d2s12                 2        DA3             110 GiB  ok                
 e3d3s07                 2        DA1             106 GiB  ok                
 e3d3s08                 2        DA2             110 GiB  ok                
 e3d3s09                 2        DA3             110 GiB  ok                
 e3d3s10                 2        DA1             106 GiB  ok                
 e3d3s11                 2        DA2             110 GiB  ok                
 e3d3s12                 2        DA3             110 GiB  ok                
 e3d4s07                 2        DA1             106 GiB  ok                
 e3d4s08                 2        DA2             110 GiB  ok                
 e3d4s09                 2        DA3             110 GiB  ok                
 e3d4s10                 2        DA1             108 GiB  ok                
 e3d4s11                 2        DA2             110 GiB  ok                
 e3d4s12                 2        DA3             110 GiB  ok                
 e3d5s07                 2        DA1             108 GiB  ok                
 e3d5s08                 2        DA2             110 GiB  ok                
 e3d5s09                 2        DA3             110 GiB  ok                
 e3d5s10                 2        DA1             106 GiB  ok                
 e3d5s11                 2        DA3             110 GiB  ok                
 e3d5s12log              2        LOG             186 GiB  ok                
 e4d1s07                 2        DA1             106 GiB  ok                
 e4d1s08                 2        DA2             110 GiB  ok                
 e4d1s09                 2        DA3             110 GiB  ok                
 e4d1s10                 2        DA1             106 GiB  ok                
 e4d1s11                 2        DA2             110 GiB  ok                
 e4d1s12                 2        DA3             110 GiB  ok                
 e4d2s07                 2        DA1             106 GiB  ok                
 e4d2s08                 2        DA2             110 GiB  ok                
 e4d2s09                 2        DA3             110 GiB  ok                
 e4d2s10                 2        DA1             106 GiB  ok                
 e4d2s11                 2        DA2             110 GiB  ok                
 e4d2s12                 2        DA3             110 GiB  ok                
 e4d3s07                 2        DA1             108 GiB  ok                
 e4d3s08                 2        DA2             110 GiB  ok                
 e4d3s09                 2        DA3             110 GiB  ok                
 e4d3s10                 2        DA1             108 GiB  ok                
 e4d3s11                 2        DA2             110 GiB  ok                
 e4d3s12                 2        DA3             110 GiB  ok                
 e4d4s07                 2        DA1             106 GiB  ok                
 e4d4s08                 2        DA2             110 GiB  ok                
 e4d4s09                 2        DA3             110 GiB  ok                
 e4d4s10                 2        DA1             106 GiB  ok                
 e4d4s11                 2        DA2             110 GiB  ok                
 e4d4s12                 2        DA3             110 GiB  ok                
 e4d5s07                 2        DA1             106 GiB  ok                
 e4d5s08                 2        DA2             110 GiB  ok                
 e4d5s09                 2        DA3             110 GiB  ok                
 e4d5s10                 2        DA1             106 GiB  ok                
 e4d5s11                 2        DA3             110 GiB  ok                
 e5d1s07                 2        DA1             108 GiB  ok                
 e5d1s08                 2        DA2             110 GiB  ok                
 e5d1s09                 2        DA3             110 GiB  ok                
 e5d1s10                 2        DA1             106 GiB  ok                
 e5d1s11                 2        DA2             110 GiB  ok                
 e5d1s12                 2        DA3             110 GiB  ok                
 e5d2s07                 2        DA1             108 GiB  ok                
 e5d2s08                 2        DA2             110 GiB  ok                
 e5d2s09                 2        DA3             110 GiB  ok                
 e5d2s10                 2        DA1             108 GiB  ok                
 e5d2s11                 2        DA2             110 GiB  ok                
 e5d2s12                 2        DA3             110 GiB  ok                
 e5d3s07                 2        DA1             108 GiB  ok                
 e5d3s08                 2        DA2             110 GiB  ok                
 e5d3s09                 2        DA3             110 GiB  ok                
 e5d3s10                 2        DA1             106 GiB  ok                
 e5d3s11                 2        DA2             110 GiB  ok                
 e5d3s12                 2        DA3             110 GiB  ok                
 e5d4s07                 2        DA1             108 GiB  ok                
 e5d4s08                 2        DA2             110 GiB  ok                
 e5d4s09                 2        DA3             110 GiB  ok                
 e5d4s10                 2        DA1             108 GiB  ok                
 e5d4s11                 2        DA2             110 GiB  ok                
 e5d4s12                 2        DA3             110 GiB  ok                
 e5d5s07                 2        DA1             108 GiB  ok                
 e5d5s08                 2        DA2             110 GiB  ok                
 e5d5s09                 2        DA3             110 GiB  ok                
 e5d5s10                 2        DA1             106 GiB  ok                
 e5d5s11                 2        DA2             110 GiB  ok                
 e6d1s07                 2        DA1             108 GiB  ok                
 e6d1s08                 2        DA2             110 GiB  ok                
 e6d1s09                 2        DA3             110 GiB  ok                
 e6d1s10                 2        DA1             108 GiB  ok                
 e6d1s11                 2        DA2             110 GiB  ok                
 e6d1s12                 2        DA3             110 GiB  ok                
 e6d2s07                 2        DA1             106 GiB  ok                
 e6d2s08                 2        DA2             110 GiB  ok                
 e6d2s09                 2        DA3             108 GiB  ok                
 e6d2s10                 2        DA1             108 GiB  ok                
 e6d2s11                 2        DA2             108 GiB  ok                
 e6d2s12                 2        DA3             108 GiB  ok                
 e6d3s07                 2        DA1             106 GiB  ok                
 e6d3s08                 2        DA2             108 GiB  ok                
 e6d3s09                 2        DA3             108 GiB  ok                
 e6d3s10                 2        DA1             106 GiB  ok                
 e6d3s11                 2        DA2             108 GiB  ok                
 e6d3s12                 2        DA3             108 GiB  ok                
 e6d4s07                 2        DA1             106 GiB  ok                
 e6d4s08                 2        DA2             108 GiB  ok                
 e6d4s09                 2        DA3             108 GiB  ok                
 e6d4s10                 2        DA1             108 GiB  ok                
 e6d4s11                 2        DA2             108 GiB  ok                
 e6d4s12                 2        DA3             110 GiB  ok                
 e6d5s07                 2        DA1             108 GiB  ok                
 e6d5s08                 2        DA2             110 GiB  ok                
 e6d5s09                 2        DA3             110 GiB  ok                
 e6d5s10                 2        DA1             108 GiB  ok                
 e6d5s11                 2        DA2             110 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss03b_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss03b_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss03b_MetaData_8M_3p_1  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss03b_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss03b_MetaData_8M_3p_3  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss03b_Data_8M_3p_1  8+3p                DA1              99 TiB      8 MiB     32 KiB             
 gss03b_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss03b_Data_8M_3p_3  8+3p                DA3              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss03b.ebi.ac.uk                                 gss03b.ebi.ac.uk,gss03a.ebi.ac.uk

-------------- next part --------------

=== mmdiag: config ===
   allowDeleteAclOnChmod 1
   assertOnStructureError 0
   atimeDeferredSeconds 86400
 ! cipherList AUTHONLY
 ! clusterId 17987981184946329605
 ! clusterName GSS.ebi.ac.uk
   consoleLogEvents 0
   dataStructureDump 1 /tmp/mmfs
   dataStructureDumpOnRGOpenFailed 0 /tmp/mmfs
   dataStructureDumpOnSGPanic 0 /tmp/mmfs
   dataStructureDumpWait 60
   dbBlockSizeThreshold -1
   distributedTokenServer 1
   dmapiAllowMountOnWindows 1
   dmapiDataEventRetry 2
   dmapiEnable 1
   dmapiEventBuffers 64
   dmapiEventTimeout -1
 ! dmapiFileHandleSize 32
   dmapiMountEvent all
   dmapiMountTimeout 60
   dmapiSessionFailureTimeout 0
   dmapiWorkerThreads 12
   enableIPv6 0
   enableLowspaceEvents 0
   enableNFSCluster 0
   enableStatUIDremap 0
   enableTreeBasedQuotas 0
   enableUIDremap 0
   encryptionCryptoEngineLibName (NULL)
   encryptionCryptoEngineType CLiC
   enforceFilesetQuotaOnRoot 0
   envVar 
 ! failureDetectionTime 60
   fgdlActivityTimeWindow 10
   fgdlLeaveThreshold 1000
   fineGrainDirLocks 1
   FIPS1402mode 0
   FleaDisableIntegrityChecks 0
   FleaNumAsyncIOThreads 2
   FleaNumLEBBuffers 256
   FleaPreferredStripSize 0
 ! flushedDataTarget 1024
 ! flushedInodeTarget 1024
   healthCheckInterval 10
   idleSocketTimeout 3600
   ignorePrefetchLUNCount 0
   ignoreReplicaSpaceOnStat 0
   ignoreReplicationForQuota 0
   ignoreReplicationOnStatfs 0
 ! ioHistorySize 65536
   iscanPrefetchAggressiveness 2
   leaseDMSTimeout -1
   leaseDuration -1
   leaseRecoveryWait 35
 ! logBufferCount 20
 ! logWrapAmountPct 2
 ! logWrapThreads 128
   lrocChecksum 0
   lrocData 1
   lrocDataMaxBufferSize 32768
   lrocDataMaxFileSize 32768
   lrocDataStubFileSize 0
   lrocDeviceMaxSectorsKB 64
   lrocDeviceNrRequests 1024
   lrocDeviceQueueDepth 31
   lrocDevices 
   lrocDeviceScheduler deadline
   lrocDeviceSetParams 1
   lrocDirectories 1
   lrocInodes 1
 ! maxAllocRegionsPerNode 32
 ! maxBackgroundDeletionThreads 16
 ! maxblocksize 16777216
 ! maxBufferCleaners 1024
 ! maxBufferDescs 2097152
   maxDiskAddrBuffs -1
   maxFcntlRangesPerFile 200
 ! maxFileCleaners 1024
   maxFileNameBytes 255
 ! maxFilesToCache 12288
 ! maxGeneralThreads 1280
 ! maxInodeDeallocPrefetch 128
 ! maxMBpS 16000
   maxMissedPingTimeout 60
 ! maxReceiverThreads 128
 ! maxStatCache 512
   maxTokenServers 128
   minMissedPingTimeout 3
   minQuorumNodes 1
 ! minReleaseLevel 1340
 ! myNodeConfigNumber 5
   noSpaceEventInterval 120
   nsdBufSpace (% of PagePool)  30
 ! nsdClientCksumTypeLocal NsdCksum_Ck64
 ! nsdClientCksumTypeRemote NsdCksum_Ck64
   nsdDumpBuffersOnCksumError 0 nsd_cksum_capture
 ! nsdInlineWriteMax 32768
 ! nsdMaxWorkerThreads 3072
 ! nsdMinWorkerThreads 3072
   nsdMultiQueue 256
   nsdRAIDAllowTraditionalNSD 0
   nsdRAIDAULogColocationLimit 131072
   nsdRAIDBackgroundMinPct 5
 ! nsdRAIDBlockDeviceMaxSectorsKB 4096
 ! nsdRAIDBlockDeviceNrRequests 32
 ! nsdRAIDBlockDeviceQueueDepth 16
 ! nsdRAIDBlockDeviceScheduler deadline
 ! nsdRAIDBufferPoolSizePct (% of PagePool) 80
   nsdRAIDBuffersPromotionThresholdPct 50
   nsdRAIDCreateVdiskThreads 8
   nsdRAIDDiskDiscoveryInterval 180
 ! nsdRAIDEventLogToConsole all
 ! nsdRAIDFastWriteFSDataLimit 65536
 ! nsdRAIDFastWriteFSMetadataLimit 262144
 ! nsdRAIDFlusherBuffersLimitPct 80
 ! nsdRAIDFlusherBuffersLowWatermarkPct 20
 ! nsdRAIDFlusherFWLogHighWatermarkMB 1000
 ! nsdRAIDFlusherFWLogLimitMB 5000
 ! nsdRAIDFlusherThreadsHighWatermark 512
 ! nsdRAIDFlusherThreadsLowWatermark 1
 ! nsdRAIDFlusherTracksLimitPct 80
 ! nsdRAIDFlusherTracksLowWatermarkPct 20
   nsdRAIDForegroundMinPct 15
 ! nsdRAIDMaxTransientStale2FT 1
 ! nsdRAIDMaxTransientStale3FT 1
   nsdRAIDMediumWriteLimitPct 50
   nsdRAIDMultiQueue -1
 ! nsdRAIDReconstructAggressiveness 1
 ! nsdRAIDSmallBufferSize 262144
 ! nsdRAIDSmallThreadRatio 2
 ! nsdRAIDThreadsPerQueue 16
 ! nsdRAIDTracks 131072
 ! numaMemoryInterleave yes
   opensslLibName /usr/lib64/libssl.so.10:/usr/lib64/libssl.so.6:/usr/lib64/libssl.so.0.9.8:/lib64/libssl.so.6:libssl.so:libssl.so.0:libssl.so.4
 ! pagepool 40802189312
   pagepoolMaxPhysMemPct 75
   prefetchAggressiveness 2
   prefetchAggressivenessRead -1
   prefetchAggressivenessWrite -1
 ! prefetchPct 5
   prefetchThreads 72
   readReplicaPolicy default
   remoteMountTimeout 10
   sharedMemLimit 0
   sharedMemReservePct 15
   sidAutoMapRangeLength 15000000
   sidAutoMapRangeStart 15000000
 ! socketMaxListenConnections 1500
   socketRcvBufferSize 0
   socketSndBufferSize 0
   statCacheDirPct 10
   subnets 
 ! syncWorkerThreads 256
   tiebreaker system
   tiebreakerDisks 
   tokenMemLimit 536870912
   treatOSyncLikeODSync 1
   tscTcpPort 1191
 ! tscWorkerPool 64
   uidDomain GSS.ebi.ac.uk
   uidExpiration 36000
   unmountOnDiskFail no
   useDIOXW 1
   usePersistentReserve 0
   verbsLibName libibverbs.so
   verbsPorts 
   verbsRdma disable
   verbsRdmaCm disable
   verbsRdmaCmLibName librdmacm.so
   verbsRdmaMaxSendBytes 16777216
   verbsRdmaMinBytes 8192
   verbsRdmaQpRtrMinRnrTimer 18
   verbsRdmaQpRtrPathMtu 2048
   verbsRdmaQpRtrSl 0
   verbsRdmaQpRtrSlDynamic 0
   verbsRdmaQpRtrSlDynamicTimeout 10
   verbsRdmaQpRtsRetryCnt 6
   verbsRdmaQpRtsRnrRetry 6
   verbsRdmaQpRtsTimeout 18
   verbsRdmaSend 0
   verbsRdmasPerConnection 8
   verbsRdmasPerNode 0
   verbsRdmaTimeout 18
   verifyGpfsReady 0
 ! worker1Threads 1024
 ! worker3Threads 32
   writebehindThreshold 524288

From oehmes at us.ibm.com  Tue Oct 14 18:23:50 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 10:23:50 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <543D51B6.3070602@ebi.ac.uk>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
	<543D51B6.3070602@ebi.ac.uk>
Message-ID: <OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>

you basically run GSS 1.0 code , while in the current version is GSS 2.0 
(which replaced Version 1.5 2 month ago) 

GSS 1.5 and 2.0 have several enhancements in this space so i strongly 
encourage you to upgrade your systems. 

if you can specify a bit what your workload is there might also be 
additional knobs we can turn to change the behavior. 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------

gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM:

> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
> Date: 10/14/2014 09:40 AM
> Subject: Re: [gpfsug-discuss] wait for permission to append to log
> Sent by: gpfsug-discuss-bounces at gpfsug.org
> 
> Thanks in advance for your help.
> 
> We have 6 RG:

>  recovery group        vdisks     vdisks  servers
>  ------------------  -----------  ------  -------
>  gss01a                        4       8  
gss01a.ebi.ac.uk,gss01b.ebi.ac.uk 
>  gss01b                        4       8  
gss01b.ebi.ac.uk,gss01a.ebi.ac.uk 
>  gss02a                        4       8  
gss02a.ebi.ac.uk,gss02b.ebi.ac.uk 
>  gss02b                        4       8  
gss02b.ebi.ac.uk,gss02a.ebi.ac.uk 
>  gss03a                        4       8  
gss03a.ebi.ac.uk,gss03b.ebi.ac.uk 
>  gss03b                        4       8  
gss03b.ebi.ac.uk,gss03a.ebi.ac.uk 
> 
> Check the attached file for RG details. 
> Following mmlsconfig:

> [root at gss01a ~]# mmlsconfig
> Configuration data for cluster GSS.ebi.ac.uk:
> ---------------------------------------------
> myNodeConfigNumber 1
> clusterName GSS.ebi.ac.uk
> clusterId 17987981184946329605
> autoload no
> dmapiFileHandleSize 32
> minReleaseLevel 3.5.0.11
> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
> pagepool 38g
> nsdRAIDBufferPoolSizePct 80
> maxBufferDescs 2m
> numaMemoryInterleave yes
> prefetchPct 5
> maxblocksize 16m
> nsdRAIDTracks 128k
> ioHistorySize 64k
> nsdRAIDSmallBufferSize 256k
> nsdMaxWorkerThreads 3k
> nsdMinWorkerThreads 3k
> nsdRAIDSmallThreadRatio 2
> nsdRAIDThreadsPerQueue 16
> nsdClientCksumTypeLocal ck64
> nsdClientCksumTypeRemote ck64
> nsdRAIDEventLogToConsole all
> nsdRAIDFastWriteFSDataLimit 64k
> nsdRAIDFastWriteFSMetadataLimit 256k
> nsdRAIDReconstructAggressiveness 1
> nsdRAIDFlusherBuffersLowWatermarkPct 20
> nsdRAIDFlusherBuffersLimitPct 80
> nsdRAIDFlusherTracksLowWatermarkPct 20
> nsdRAIDFlusherTracksLimitPct 80
> nsdRAIDFlusherFWLogHighWatermarkMB 1000
> nsdRAIDFlusherFWLogLimitMB 5000
> nsdRAIDFlusherThreadsLowWatermark 1
> nsdRAIDFlusherThreadsHighWatermark 512
> nsdRAIDBlockDeviceMaxSectorsKB 4096
> nsdRAIDBlockDeviceNrRequests 32
> nsdRAIDBlockDeviceQueueDepth 16
> nsdRAIDBlockDeviceScheduler deadline
> nsdRAIDMaxTransientStale2FT 1
> nsdRAIDMaxTransientStale3FT 1
> syncWorkerThreads 256
> tscWorkerPool 64
> nsdInlineWriteMax 32k
> maxFilesToCache 12k
> maxStatCache 512
> maxGeneralThreads 1280
> flushedDataTarget 1024
> flushedInodeTarget 1024
> maxFileCleaners 1024
> maxBufferCleaners 1024
> logBufferCount 20
> logWrapAmountPct 2
> logWrapThreads 128
> maxAllocRegionsPerNode 32
> maxBackgroundDeletionThreads 16
> maxInodeDeallocPrefetch 128
> maxMBpS 16000
> maxReceiverThreads 128
> worker1Threads 1024
> worker3Threads 32
> [common]
> cipherList AUTHONLY
> socketMaxListenConnections 1500
> failureDetectionTime 60
> [common]
> adminMode central
> 
> File systems in cluster GSS.ebi.ac.uk:
> --------------------------------------
> /dev/gpfs1

> For more configuration paramenters i also attached a file with the 
> complete output of mmdiag --config.
> 
> 
> and mmlsfs:
> 
> File system attributes for /dev/gpfs1:
> ======================================
> flag                value                    description
> ------------------- ------------------------ 
> -----------------------------------
>  -f                 32768                    Minimum fragment size 
> in bytes (system pool)
>                     262144                   Minimum fragment size 
> in bytes (other pools)
>  -i                 512                      Inode size in bytes
>  -I                 32768                    Indirect block size in 
bytes
>  -m                 2                        Default number of 
> metadata replicas
>  -M                 2                        Maximum number of 
> metadata replicas
>  -r                 1                        Default number of data 
replicas
>  -R                 2                        Maximum number of data 
replicas
>  -j                 scatter                  Block allocation type
>  -D                 nfs4                     File locking semantics in 
effect
>  -k                 all                      ACL semantics in effect
>  -n                 1000                     Estimated number of 
> nodes that will mount file system
>  -B                 1048576                  Block size (system pool)
>                     8388608                  Block size (other pools)
>  -Q                 user;group;fileset       Quotas enforced
>                     user;group;fileset       Default quotas enabled
>  --filesetdf        no                       Fileset df enabled?
>  -V                 13.23 (3.5.0.7)          File system version
>  --create-time      Tue Mar 18 16:01:24 2014 File system creation time
>  -u                 yes                      Support for large LUNs?
>  -z                 no                       Is DMAPI enabled?
>  -L                 4194304                  Logfile size
>  -E                 yes                      Exact mtime mount option
>  -S                 yes                      Suppress atime mount option
>  -K                 whenpossible             Strict replica allocation 
option
>  --fastea           yes                      Fast external attributes 
enabled?
>  --inode-limit      134217728                Maximum number of inodes
>  -P                 system;data              Disk storage pools in file 
system
>  -d                 
> 
gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
>  -d                 
> 
gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
>  -d                 
> 
gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
>  -d                 
> 
gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
>  -d                 
> 
gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 
> Disks in file system
>  --perfileset-quota no                       Per-fileset quota 
enforcement
>  -A                 yes                      Automatic mount option
>  -o                 none                     Additional mount options
>  -T                 /gpfs1                   Default mount point
>  --mount-priority   0                        Mount priority
> 
> 
> Regards,
> Salvatore
> 

> On 14/10/14 17:22, Sven Oehme wrote:
> your GSS code version is very backlevel. 
> 
> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk 

> as well as mmlsconfig and mmlsfs all 
> 
> thx. Sven 
> 
> ------------------------------------------
> Sven Oehme 
> Scalable Storage Research 
> email: oehmes at us.ibm.com 
> Phone: +1 (408) 824-8904 
> IBM Almaden Research Lab 
> ------------------------------------------ 
> 
> 
> 
> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk> 
> To:        gpfsug-discuss at gpfsug.org 
> Date:        10/14/2014 08:23 AM 
> Subject:        Re: [gpfsug-discuss] wait for permission to append to 
log 
> Sent by:        gpfsug-discuss-bounces at gpfsug.org 
> 
> 
> 
> 
> On 14/10/14 15:51, Sven Oehme wrote: 
> it means there is contention on inserting data into the fast write 
> log on the GSS Node, which could be config or workload related 
> what GSS code version are you running 
> [root at ebi5-251 ~]# mmdiag --version
> 
> === mmdiag: version ===
> Current GPFS build: "3.5.0-11 efix1 (888041)".
> Built on Jul  9 2013 at 18:03:32
> Running 6 days 2 hours 10 minutes 35 secs 
> 
> 
> 
> and how are the nodes connected with each other (Ethernet or IB) ? 
> ethernet. they use the same bonding (4x10Gb/s) where the data is 
> passing. We don't have admin dedicated network 
> 
> [root at gss03a ~]# mmlscluster 
> 
> GPFS cluster information
> ========================
>   GPFS cluster name:         GSS.ebi.ac.uk
>   GPFS cluster id:           17987981184946329605
>   GPFS UID domain:           GSS.ebi.ac.uk
>   Remote shell command:      /usr/bin/ssh
>   Remote file copy command:  /usr/bin/scp
> 
> GPFS cluster configuration servers:
> -----------------------------------
>   Primary server:    gss01a.ebi.ac.uk
>   Secondary server:  gss02b.ebi.ac.uk
> 
>  Node  Daemon node name    IP address  Admin node name     Designation
> -----------------------------------------------------------------------
>    1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk    
quorum-manager
>    2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk    
quorum-manager
>    3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk    
quorum-manager
>    4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk    
quorum-manager
>    5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk    
quorum-manager
>    6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk    
quorum-manager
> 
> 
> Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different 
> subnet because of datacenter constraints ( They are not physically 
> in the same row, and due to network constraints was not possible to 
> put them in the same subnet). The packets are routed, but should not
> be a problem as there is 160Gb/s bandwidth between them.
> 
> Regards,
> Salvatore
> 
> 
> 
> ------------------------------------------
> Sven Oehme 
> Scalable Storage Research 
> email: oehmes at us.ibm.com 
> Phone: +1 (408) 824-8904 
> IBM Almaden Research Lab 
> ------------------------------------------ 
> 
> 
> 
> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk> 
> To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
> Date:        10/14/2014 07:40 AM 
> Subject:        [gpfsug-discuss] wait for permission to append to log 
> Sent by:        gpfsug-discuss-bounces at gpfsug.org 
> 
> 
> 
> hello all,
> could someone explain me the meaning of those waiters?
> 
> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> 
> Does it means that the vdisk logs are struggling?
> 
> Regards,
> Salvatore
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/
> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/7c32e015/attachment.htm>

From zgiles at gmail.com  Tue Oct 14 18:32:50 2014
From: zgiles at gmail.com (Zachary Giles)
Date: Tue, 14 Oct 2014 13:32:50 -0400
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>
References: <543D35A7.7080800@ebi.ac.uk>
	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>
	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
	<543D51B6.3070602@ebi.ac.uk>
	<OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>
Message-ID: <CAMYZk=en0bOeLErefCQ0X_TybYh+9QUxiKszhnwFQvqz_BHOFQ@mail.gmail.com>

Except that AFAIK no one has published how to update GSS or where the
update code is.. All I've heard is "contact your sales rep".
Any pointers?

On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme <oehmes at us.ibm.com> wrote:
> you basically run GSS 1.0 code , while in the current version is GSS 2.0
> (which replaced Version 1.5 2 month ago)
>
> GSS 1.5 and 2.0 have several enhancements in this space so i strongly
> encourage you to upgrade your systems.
>
> if you can specify a bit what your workload is there might also be
> additional knobs we can turn to change the behavior.
>
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
> gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM:
>
>> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date: 10/14/2014 09:40 AM
>> Subject: Re: [gpfsug-discuss] wait for permission to append to log
>> Sent by: gpfsug-discuss-bounces at gpfsug.org
>>
>> Thanks in advance for your help.
>>
>> We have 6 RG:
>
>>  recovery group        vdisks     vdisks  servers
>>  ------------------  -----------  ------  -------
>>  gss01a                        4       8
>> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk
>>  gss01b                        4       8
>> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk
>>  gss02a                        4       8
>> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk
>>  gss02b                        4       8
>> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk
>>  gss03a                        4       8
>> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk
>>  gss03b                        4       8
>> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk
>>
>> Check the attached file for RG details.
>> Following mmlsconfig:
>
>> [root at gss01a ~]# mmlsconfig
>> Configuration data for cluster GSS.ebi.ac.uk:
>> ---------------------------------------------
>> myNodeConfigNumber 1
>> clusterName GSS.ebi.ac.uk
>> clusterId 17987981184946329605
>> autoload no
>> dmapiFileHandleSize 32
>> minReleaseLevel 3.5.0.11
>> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
>> pagepool 38g
>> nsdRAIDBufferPoolSizePct 80
>> maxBufferDescs 2m
>> numaMemoryInterleave yes
>> prefetchPct 5
>> maxblocksize 16m
>> nsdRAIDTracks 128k
>> ioHistorySize 64k
>> nsdRAIDSmallBufferSize 256k
>> nsdMaxWorkerThreads 3k
>> nsdMinWorkerThreads 3k
>> nsdRAIDSmallThreadRatio 2
>> nsdRAIDThreadsPerQueue 16
>> nsdClientCksumTypeLocal ck64
>> nsdClientCksumTypeRemote ck64
>> nsdRAIDEventLogToConsole all
>> nsdRAIDFastWriteFSDataLimit 64k
>> nsdRAIDFastWriteFSMetadataLimit 256k
>> nsdRAIDReconstructAggressiveness 1
>> nsdRAIDFlusherBuffersLowWatermarkPct 20
>> nsdRAIDFlusherBuffersLimitPct 80
>> nsdRAIDFlusherTracksLowWatermarkPct 20
>> nsdRAIDFlusherTracksLimitPct 80
>> nsdRAIDFlusherFWLogHighWatermarkMB 1000
>> nsdRAIDFlusherFWLogLimitMB 5000
>> nsdRAIDFlusherThreadsLowWatermark 1
>> nsdRAIDFlusherThreadsHighWatermark 512
>> nsdRAIDBlockDeviceMaxSectorsKB 4096
>> nsdRAIDBlockDeviceNrRequests 32
>> nsdRAIDBlockDeviceQueueDepth 16
>> nsdRAIDBlockDeviceScheduler deadline
>> nsdRAIDMaxTransientStale2FT 1
>> nsdRAIDMaxTransientStale3FT 1
>> syncWorkerThreads 256
>> tscWorkerPool 64
>> nsdInlineWriteMax 32k
>> maxFilesToCache 12k
>> maxStatCache 512
>> maxGeneralThreads 1280
>> flushedDataTarget 1024
>> flushedInodeTarget 1024
>> maxFileCleaners 1024
>> maxBufferCleaners 1024
>> logBufferCount 20
>> logWrapAmountPct 2
>> logWrapThreads 128
>> maxAllocRegionsPerNode 32
>> maxBackgroundDeletionThreads 16
>> maxInodeDeallocPrefetch 128
>> maxMBpS 16000
>> maxReceiverThreads 128
>> worker1Threads 1024
>> worker3Threads 32
>> [common]
>> cipherList AUTHONLY
>> socketMaxListenConnections 1500
>> failureDetectionTime 60
>> [common]
>> adminMode central
>>
>> File systems in cluster GSS.ebi.ac.uk:
>> --------------------------------------
>> /dev/gpfs1
>
>> For more configuration paramenters i also attached a file with the
>> complete output of mmdiag --config.
>>
>>
>> and mmlsfs:
>>
>> File system attributes for /dev/gpfs1:
>> ======================================
>> flag                value                    description
>> ------------------- ------------------------
>> -----------------------------------
>>  -f                 32768                    Minimum fragment size
>> in bytes (system pool)
>>                     262144                   Minimum fragment size
>> in bytes (other pools)
>>  -i                 512                      Inode size in bytes
>>  -I                 32768                    Indirect block size in bytes
>>  -m                 2                        Default number of
>> metadata replicas
>>  -M                 2                        Maximum number of
>> metadata replicas
>>  -r                 1                        Default number of data
>> replicas
>>  -R                 2                        Maximum number of data
>> replicas
>>  -j                 scatter                  Block allocation type
>>  -D                 nfs4                     File locking semantics in
>> effect
>>  -k                 all                      ACL semantics in effect
>>  -n                 1000                     Estimated number of
>> nodes that will mount file system
>>  -B                 1048576                  Block size (system pool)
>>                     8388608                  Block size (other pools)
>>  -Q                 user;group;fileset       Quotas enforced
>>                     user;group;fileset       Default quotas enabled
>>  --filesetdf        no                       Fileset df enabled?
>>  -V                 13.23 (3.5.0.7)          File system version
>>  --create-time      Tue Mar 18 16:01:24 2014 File system creation time
>>  -u                 yes                      Support for large LUNs?
>>  -z                 no                       Is DMAPI enabled?
>>  -L                 4194304                  Logfile size
>>  -E                 yes                      Exact mtime mount option
>>  -S                 yes                      Suppress atime mount option
>>  -K                 whenpossible             Strict replica allocation
>> option
>>  --fastea           yes                      Fast external attributes
>> enabled?
>>  --inode-limit      134217728                Maximum number of inodes
>>  -P                 system;data              Disk storage pools in file
>> system
>>  -d
>>
>> gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
>>  -d
>>
>> gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
>>  -d
>>
>> gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
>>  -d
>>
>> gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
>>  -d
>>
>> gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3
>> Disks in file system
>>  --perfileset-quota no                       Per-fileset quota enforcement
>>  -A                 yes                      Automatic mount option
>>  -o                 none                     Additional mount options
>>  -T                 /gpfs1                   Default mount point
>>  --mount-priority   0                        Mount priority
>>
>>
>> Regards,
>> Salvatore
>>
>
>> On 14/10/14 17:22, Sven Oehme wrote:
>> your GSS code version is very backlevel.
>>
>> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk
>> as well as mmlsconfig and mmlsfs all
>>
>> thx. Sven
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug-discuss at gpfsug.org
>> Date:        10/14/2014 08:23 AM
>> Subject:        Re: [gpfsug-discuss] wait for permission to append to log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>>
>> On 14/10/14 15:51, Sven Oehme wrote:
>> it means there is contention on inserting data into the fast write
>> log on the GSS Node, which could be config or workload related
>> what GSS code version are you running
>> [root at ebi5-251 ~]# mmdiag --version
>>
>> === mmdiag: version ===
>> Current GPFS build: "3.5.0-11 efix1 (888041)".
>> Built on Jul  9 2013 at 18:03:32
>> Running 6 days 2 hours 10 minutes 35 secs
>>
>>
>>
>> and how are the nodes connected with each other (Ethernet or IB) ?
>> ethernet. they use the same bonding (4x10Gb/s) where the data is
>> passing. We don't have admin dedicated network
>>
>> [root at gss03a ~]# mmlscluster
>>
>> GPFS cluster information
>> ========================
>>   GPFS cluster name:         GSS.ebi.ac.uk
>>   GPFS cluster id:           17987981184946329605
>>   GPFS UID domain:           GSS.ebi.ac.uk
>>   Remote shell command:      /usr/bin/ssh
>>   Remote file copy command:  /usr/bin/scp
>>
>> GPFS cluster configuration servers:
>> -----------------------------------
>>   Primary server:    gss01a.ebi.ac.uk
>>   Secondary server:  gss02b.ebi.ac.uk
>>
>>  Node  Daemon node name    IP address  Admin node name     Designation
>> -----------------------------------------------------------------------
>>    1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk    quorum-manager
>>    2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk    quorum-manager
>>    3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk    quorum-manager
>>    4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk    quorum-manager
>>    5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk    quorum-manager
>>    6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk    quorum-manager
>>
>>
>> Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different
>> subnet because of datacenter constraints ( They are not physically
>> in the same row, and due to network constraints was not possible to
>> put them in the same subnet). The packets are routed, but should not
>> be a problem as there is 160Gb/s bandwidth between them.
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date:        10/14/2014 07:40 AM
>> Subject:        [gpfsug-discuss] wait for permission to append to log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>> hello all,
>> could someone explain me the meaning of those waiters?
>>
>> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>>
>> Does it means that the vdisk logs are struggling?
>>
>> Regards,
>> Salvatore
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/
>> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM]
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
Zach Giles
zgiles at gmail.com


From oehmes at us.ibm.com  Tue Oct 14 18:38:10 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 10:38:10 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <CAMYZk=en0bOeLErefCQ0X_TybYh+9QUxiKszhnwFQvqz_BHOFQ@mail.gmail.com>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
	<543D51B6.3070602@ebi.ac.uk>	<OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>
	<CAMYZk=en0bOeLErefCQ0X_TybYh+9QUxiKszhnwFQvqz_BHOFQ@mail.gmail.com>
Message-ID: <OF139B4A1E.3C67F6BA-ON88257D71.00609314-88257D71.0060E13E@us.ibm.com>

i personally don't know, i am in GPFS Research, not in support :-)
but have you tried to contact your sales rep ? 
if you are not successful with that, shoot me a direct email with details 
about your company name, country and customer number and i try to get you 
somebody to help.

thx. Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Zachary Giles <zgiles at gmail.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/14/2014 10:33 AM
Subject:        Re: [gpfsug-discuss] wait for permission to append to log
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Except that AFAIK no one has published how to update GSS or where the
update code is.. All I've heard is "contact your sales rep".
Any pointers?

On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme <oehmes at us.ibm.com> wrote:
> you basically run GSS 1.0 code , while in the current version is GSS 2.0
> (which replaced Version 1.5 2 month ago)
>
> GSS 1.5 and 2.0 have several enhancements in this space so i strongly
> encourage you to upgrade your systems.
>
> if you can specify a bit what your workload is there might also be
> additional knobs we can turn to change the behavior.
>
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
> gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM:
>
>> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date: 10/14/2014 09:40 AM
>> Subject: Re: [gpfsug-discuss] wait for permission to append to log
>> Sent by: gpfsug-discuss-bounces at gpfsug.org
>>
>> Thanks in advance for your help.
>>
>> We have 6 RG:
>
>>  recovery group        vdisks     vdisks  servers
>>  ------------------  -----------  ------  -------
>>  gss01a                        4       8
>> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk
>>  gss01b                        4       8
>> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk
>>  gss02a                        4       8
>> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk
>>  gss02b                        4       8
>> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk
>>  gss03a                        4       8
>> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk
>>  gss03b                        4       8
>> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk
>>
>> Check the attached file for RG details.
>> Following mmlsconfig:
>
>> [root at gss01a ~]# mmlsconfig
>> Configuration data for cluster GSS.ebi.ac.uk:
>> ---------------------------------------------
>> myNodeConfigNumber 1
>> clusterName GSS.ebi.ac.uk
>> clusterId 17987981184946329605
>> autoload no
>> dmapiFileHandleSize 32
>> minReleaseLevel 3.5.0.11
>> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
>> pagepool 38g
>> nsdRAIDBufferPoolSizePct 80
>> maxBufferDescs 2m
>> numaMemoryInterleave yes
>> prefetchPct 5
>> maxblocksize 16m
>> nsdRAIDTracks 128k
>> ioHistorySize 64k
>> nsdRAIDSmallBufferSize 256k
>> nsdMaxWorkerThreads 3k
>> nsdMinWorkerThreads 3k
>> nsdRAIDSmallThreadRatio 2
>> nsdRAIDThreadsPerQueue 16
>> nsdClientCksumTypeLocal ck64
>> nsdClientCksumTypeRemote ck64
>> nsdRAIDEventLogToConsole all
>> nsdRAIDFastWriteFSDataLimit 64k
>> nsdRAIDFastWriteFSMetadataLimit 256k
>> nsdRAIDReconstructAggressiveness 1
>> nsdRAIDFlusherBuffersLowWatermarkPct 20
>> nsdRAIDFlusherBuffersLimitPct 80
>> nsdRAIDFlusherTracksLowWatermarkPct 20
>> nsdRAIDFlusherTracksLimitPct 80
>> nsdRAIDFlusherFWLogHighWatermarkMB 1000
>> nsdRAIDFlusherFWLogLimitMB 5000
>> nsdRAIDFlusherThreadsLowWatermark 1
>> nsdRAIDFlusherThreadsHighWatermark 512
>> nsdRAIDBlockDeviceMaxSectorsKB 4096
>> nsdRAIDBlockDeviceNrRequests 32
>> nsdRAIDBlockDeviceQueueDepth 16
>> nsdRAIDBlockDeviceScheduler deadline
>> nsdRAIDMaxTransientStale2FT 1
>> nsdRAIDMaxTransientStale3FT 1
>> syncWorkerThreads 256
>> tscWorkerPool 64
>> nsdInlineWriteMax 32k
>> maxFilesToCache 12k
>> maxStatCache 512
>> maxGeneralThreads 1280
>> flushedDataTarget 1024
>> flushedInodeTarget 1024
>> maxFileCleaners 1024
>> maxBufferCleaners 1024
>> logBufferCount 20
>> logWrapAmountPct 2
>> logWrapThreads 128
>> maxAllocRegionsPerNode 32
>> maxBackgroundDeletionThreads 16
>> maxInodeDeallocPrefetch 128
>> maxMBpS 16000
>> maxReceiverThreads 128
>> worker1Threads 1024
>> worker3Threads 32
>> [common]
>> cipherList AUTHONLY
>> socketMaxListenConnections 1500
>> failureDetectionTime 60
>> [common]
>> adminMode central
>>
>> File systems in cluster GSS.ebi.ac.uk:
>> --------------------------------------
>> /dev/gpfs1
>
>> For more configuration paramenters i also attached a file with the
>> complete output of mmdiag --config.
>>
>>
>> and mmlsfs:
>>
>> File system attributes for /dev/gpfs1:
>> ======================================
>> flag                value                    description
>> ------------------- ------------------------
>> -----------------------------------
>>  -f                 32768                    Minimum fragment size
>> in bytes (system pool)
>>                     262144                   Minimum fragment size
>> in bytes (other pools)
>>  -i                 512                      Inode size in bytes
>>  -I                 32768                    Indirect block size in 
bytes
>>  -m                 2                        Default number of
>> metadata replicas
>>  -M                 2                        Maximum number of
>> metadata replicas
>>  -r                 1                        Default number of data
>> replicas
>>  -R                 2                        Maximum number of data
>> replicas
>>  -j                 scatter                  Block allocation type
>>  -D                 nfs4                     File locking semantics in
>> effect
>>  -k                 all                      ACL semantics in effect
>>  -n                 1000                     Estimated number of
>> nodes that will mount file system
>>  -B                 1048576                  Block size (system pool)
>>                     8388608                  Block size (other pools)
>>  -Q                 user;group;fileset       Quotas enforced
>>                     user;group;fileset       Default quotas enabled
>>  --filesetdf        no                       Fileset df enabled?
>>  -V                 13.23 (3.5.0.7)          File system version
>>  --create-time      Tue Mar 18 16:01:24 2014 File system creation time
>>  -u                 yes                      Support for large LUNs?
>>  -z                 no                       Is DMAPI enabled?
>>  -L                 4194304                  Logfile size
>>  -E                 yes                      Exact mtime mount option
>>  -S                 yes                      Suppress atime mount 
option
>>  -K                 whenpossible             Strict replica allocation
>> option
>>  --fastea           yes                      Fast external attributes
>> enabled?
>>  --inode-limit      134217728                Maximum number of inodes
>>  -P                 system;data              Disk storage pools in file
>> system
>>  -d
>>
>> 
gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
>>  -d
>>
>> 
gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
>>  -d
>>
>> 
gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
>>  -d
>>
>> 
gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
>>  -d
>>
>> 
gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3
>> Disks in file system
>>  --perfileset-quota no                       Per-fileset quota 
enforcement
>>  -A                 yes                      Automatic mount option
>>  -o                 none                     Additional mount options
>>  -T                 /gpfs1                   Default mount point
>>  --mount-priority   0                        Mount priority
>>
>>
>> Regards,
>> Salvatore
>>
>
>> On 14/10/14 17:22, Sven Oehme wrote:
>> your GSS code version is very backlevel.
>>
>> can you please send me the output of mmlsrecoverygroup RGNAME -L 
--pdisk
>> as well as mmlsconfig and mmlsfs all
>>
>> thx. Sven
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug-discuss at gpfsug.org
>> Date:        10/14/2014 08:23 AM
>> Subject:        Re: [gpfsug-discuss] wait for permission to append to 
log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>>
>> On 14/10/14 15:51, Sven Oehme wrote:
>> it means there is contention on inserting data into the fast write
>> log on the GSS Node, which could be config or workload related
>> what GSS code version are you running
>> [root at ebi5-251 ~]# mmdiag --version
>>
>> === mmdiag: version ===
>> Current GPFS build: "3.5.0-11 efix1 (888041)".
>> Built on Jul  9 2013 at 18:03:32
>> Running 6 days 2 hours 10 minutes 35 secs
>>
>>
>>
>> and how are the nodes connected with each other (Ethernet or IB) ?
>> ethernet. they use the same bonding (4x10Gb/s) where the data is
>> passing. We don't have admin dedicated network
>>
>> [root at gss03a ~]# mmlscluster
>>
>> GPFS cluster information
>> ========================
>>   GPFS cluster name:         GSS.ebi.ac.uk
>>   GPFS cluster id:           17987981184946329605
>>   GPFS UID domain:           GSS.ebi.ac.uk
>>   Remote shell command:      /usr/bin/ssh
>>   Remote file copy command:  /usr/bin/scp
>>
>> GPFS cluster configuration servers:
>> -----------------------------------
>>   Primary server:    gss01a.ebi.ac.uk
>>   Secondary server:  gss02b.ebi.ac.uk
>>
>>  Node  Daemon node name    IP address  Admin node name     Designation
>> -----------------------------------------------------------------------
>>    1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk quorum-manager
>>    2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk quorum-manager
>>    3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk quorum-manager
>>    4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk quorum-manager
>>    5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk quorum-manager
>>    6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk quorum-manager
>>
>>
>> Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different
>> subnet because of datacenter constraints ( They are not physically
>> in the same row, and due to network constraints was not possible to
>> put them in the same subnet). The packets are routed, but should not
>> be a problem as there is 160Gb/s bandwidth between them.
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date:        10/14/2014 07:40 AM
>> Subject:        [gpfsug-discuss] wait for permission to append to log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>> hello all,
>> could someone explain me the meaning of those waiters?
>>
>> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>>
>> Does it means that the vdisk logs are struggling?
>>
>> Regards,
>> Salvatore
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/
>> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM]
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
Zach Giles
zgiles at gmail.com
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/369fc6e3/attachment.htm>

From tmcneil at kingston.ac.uk  Wed Oct 15 14:01:49 2014
From: tmcneil at kingston.ac.uk (Mcneil, Tony)
Date: Wed, 15 Oct 2014 14:01:49 +0100
Subject: [gpfsug-discuss] Hello
Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705@KUMBX.kuds.kingston.ac.uk>

Hello All,

Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'

We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.

The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.

So far we have migrated all our students and approximately 60% of our staff.

Looking forward to receiving some interesting posts from the forum.

Regards
Tony.

Tony McNeil
Senior Systems Support Analyst,  Infrastructure,   Information Services
______________________________________________________________________________

T   Internal: 62852
T   020 8417 2852

Kingston University London
Penrhyn Road, Kingston upon Thames KT1 2EE
www.kingston.ac.uk<http://www.kingston.ac.uk>

Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed
to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
Please consider the environment before printing this email.


This email has been scanned for all viruses by the MessageLabs Email
Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141015/8f528bcf/attachment.htm>

From Bill.Pappas at STJUDE.ORG  Thu Oct 16 14:49:57 2014
From: Bill.Pappas at STJUDE.ORG (Pappas, Bill)
Date: Thu, 16 Oct 2014 08:49:57 -0500
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>

Are you using ctdb?

Thanks,
Bill Pappas -
Manager - Enterprise Storage Group
Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
262 Danny Thomas Place, Mail Stop 504
Memphis, TN 38105
bill.pappas at stjude.org
(901) 595-4549 office
www.stjude.org
Email disclaimer: http://www.stjude.org/emaildisclaimer

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
Sent: Thursday, October 16, 2014 6:00 AM
To: gpfsug-discuss at gpfsug.org
Subject: gpfsug-discuss Digest, Vol 33, Issue 19

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at gpfsug.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at gpfsug.org

You can reach the person managing the list at
	gpfsug-discuss-owner at gpfsug.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Hello (Mcneil, Tony)


----------------------------------------------------------------------

Message: 1
Date: Wed, 15 Oct 2014 14:01:49 +0100
From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Hello
Message-ID:
	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
	
Content-Type: text/plain; charset="us-ascii"

Hello All,

Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'

We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.

The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.

So far we have migrated all our students and approximately 60% of our staff.

Looking forward to receiving some interesting posts from the forum.

Regards
Tony.

Tony McNeil
Senior Systems Support Analyst,  Infrastructure,   Information Services
______________________________________________________________________________

T   Internal: 62852
T   020 8417 2852

Kingston University London
Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>

Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
Please consider the environment before printing this email.


This email has been scanned for all viruses by the MessageLabs Email Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 33, Issue 19
**********************************************


From tmcneil at kingston.ac.uk  Fri Oct 17 06:25:00 2014
From: tmcneil at kingston.ac.uk (Mcneil, Tony)
Date: Fri, 17 Oct 2014 06:25:00 +0100
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
In-Reply-To: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>

Hi Bill,

Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel

Regards
Tony.

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
Sent: 16 October 2014 14:50
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)

Are you using ctdb?

Thanks,
Bill Pappas -
Manager - Enterprise Storage Group
Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
262 Danny Thomas Place, Mail Stop 504
Memphis, TN 38105
bill.pappas at stjude.org
(901) 595-4549 office
www.stjude.org
Email disclaimer: http://www.stjude.org/emaildisclaimer

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
Sent: Thursday, October 16, 2014 6:00 AM
To: gpfsug-discuss at gpfsug.org
Subject: gpfsug-discuss Digest, Vol 33, Issue 19

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at gpfsug.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at gpfsug.org

You can reach the person managing the list at
	gpfsug-discuss-owner at gpfsug.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Hello (Mcneil, Tony)


----------------------------------------------------------------------

Message: 1
Date: Wed, 15 Oct 2014 14:01:49 +0100
From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Hello
Message-ID:
	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
	
Content-Type: text/plain; charset="us-ascii"

Hello All,

Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'

We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.

The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.

So far we have migrated all our students and approximately 60% of our staff.

Looking forward to receiving some interesting posts from the forum.

Regards
Tony.

Tony McNeil
Senior Systems Support Analyst,  Infrastructure,   Information Services
______________________________________________________________________________

T   Internal: 62852
T   020 8417 2852

Kingston University London
Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>

Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
Please consider the environment before printing this email.


This email has been scanned for all viruses by the MessageLabs Email Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 33, Issue 19
**********************************************


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

This email has been scanned for all viruses by the MessageLabs Email
Security System.

This email has been scanned for all viruses by the MessageLabs Email
Security System.


From chair at gpfsug.org  Tue Oct 21 11:42:10 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Tue, 21 Oct 2014 11:42:10 +0100
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
In-Reply-To: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>
References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
	<41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>
Message-ID: <54463882.7070009@gpfsug.org>

I noticed that v7000 Unified is using CTDB v3.3.
What magic version is that as it's not in the git tree.  Latest tagged 
is 2.5.4.
Is that a question for Amitay?

On 17/10/14 06:25, Mcneil, Tony wrote:
> Hi Bill,
>
> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>
> Regards
> Tony.
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
> Sent: 16 October 2014 14:50
> To: gpfsug-discuss at gpfsug.org
> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>
> Are you using ctdb?
>
> Thanks,
> Bill Pappas -
> Manager - Enterprise Storage Group
> Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
> 262 Danny Thomas Place, Mail Stop 504
> Memphis, TN 38105
> bill.pappas at stjude.org
> (901) 595-4549 office
> www.stjude.org
> Email disclaimer: http://www.stjude.org/emaildisclaimer
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
> Sent: Thursday, October 16, 2014 6:00 AM
> To: gpfsug-discuss at gpfsug.org
> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>
> Send gpfsug-discuss mailing list submissions to
> 	gpfsug-discuss at gpfsug.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> or, via email, send a message with subject or body 'help' to
> 	gpfsug-discuss-request at gpfsug.org
>
> You can reach the person managing the list at
> 	gpfsug-discuss-owner at gpfsug.org
>
> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>     1. Hello (Mcneil, Tony)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 15 Oct 2014 14:01:49 +0100
> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
> Subject: [gpfsug-discuss] Hello
> Message-ID:
> 	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
> 	
> Content-Type: text/plain; charset="us-ascii"
>
> Hello All,
>
> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>
> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>
> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>
> So far we have migrated all our students and approximately 60% of our staff.
>
> Looking forward to receiving some interesting posts from the forum.
>
> Regards
> Tony.
>
> Tony McNeil
> Senior Systems Support Analyst,  Infrastructure,   Information Services
> ______________________________________________________________________________
>
> T   Internal: 62852
> T   020 8417 2852
>
> Kingston University London
> Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>
>
> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
> Please consider the environment before printing this email.
>
>
> This email has been scanned for all viruses by the MessageLabs Email Security System.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> End of gpfsug-discuss Digest, Vol 33, Issue 19
> **********************************************
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> This email has been scanned for all viruses by the MessageLabs Email
> Security System.
>
> This email has been scanned for all viruses by the MessageLabs Email
> Security System.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From rtriendl at ddn.com  Tue Oct 21 11:53:37 2014
From: rtriendl at ddn.com (Robert Triendl)
Date: Tue, 21 Oct 2014 10:53:37 +0000
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
In-Reply-To: <54463882.7070009@gpfsug.org>
References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
	<41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>
	<54463882.7070009@gpfsug.org>
Message-ID: <F7513C48-28BE-48FB-B44F-3CCF4F305DFE@ddn.com>

Yes, I think so? I am;-)

On 2014/10/21, at 19:42, Jez Tucker (Chair) <chair at gpfsug.org> wrote:

> I noticed that v7000 Unified is using CTDB v3.3.
> What magic version is that as it's not in the git tree.  Latest tagged is 2.5.4.
> Is that a question for Amitay?
> 
> On 17/10/14 06:25, Mcneil, Tony wrote:
>> Hi Bill,
>> 
>> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>> 
>> Regards
>> Tony.
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
>> Sent: 16 October 2014 14:50
>> To: gpfsug-discuss at gpfsug.org
>> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>> 
>> Are you using ctdb?
>> 
>> Thanks,
>> Bill Pappas -
>> Manager - Enterprise Storage Group
>> Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
>> 262 Danny Thomas Place, Mail Stop 504
>> Memphis, TN 38105
>> bill.pappas at stjude.org
>> (901) 595-4549 office
>> www.stjude.org
>> Email disclaimer: http://www.stjude.org/emaildisclaimer
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
>> Sent: Thursday, October 16, 2014 6:00 AM
>> To: gpfsug-discuss at gpfsug.org
>> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>> 
>> Send gpfsug-discuss mailing list submissions to
>> 	gpfsug-discuss at gpfsug.org
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> or, via email, send a message with subject or body 'help' to
>> 	gpfsug-discuss-request at gpfsug.org
>> 
>> You can reach the person managing the list at
>> 	gpfsug-discuss-owner at gpfsug.org
>> 
>> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>> 
>> 
>> Today's Topics:
>> 
>>    1. Hello (Mcneil, Tony)
>> 
>> 
>> ----------------------------------------------------------------------
>> 
>> Message: 1
>> Date: Wed, 15 Oct 2014 14:01:49 +0100
>> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
>> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
>> Subject: [gpfsug-discuss] Hello
>> Message-ID:
>> 	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
>> 	
>> Content-Type: text/plain; charset="us-ascii"
>> 
>> Hello All,
>> 
>> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>> 
>> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>> 
>> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>> 
>> So far we have migrated all our students and approximately 60% of our staff.
>> 
>> Looking forward to receiving some interesting posts from the forum.
>> 
>> Regards
>> Tony.
>> 
>> Tony McNeil
>> Senior Systems Support Analyst,  Infrastructure,   Information Services
>> ______________________________________________________________________________
>> 
>> T   Internal: 62852
>> T   020 8417 2852
>> 
>> Kingston University London
>> Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>
>> 
>> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
>> Please consider the environment before printing this email.
>> 
>> 
>> This email has been scanned for all viruses by the MessageLabs Email Security System.
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>
>> 
>> ------------------------------
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> End of gpfsug-discuss Digest, Vol 33, Issue 19
>> **********************************************
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> This email has been scanned for all viruses by the MessageLabs Email
>> Security System.
>> 
>> This email has been scanned for all viruses by the MessageLabs Email
>> Security System.
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Bill.Pappas at STJUDE.ORG  Tue Oct 21 16:59:08 2014
From: Bill.Pappas at STJUDE.ORG (Pappas, Bill)
Date: Tue, 21 Oct 2014 10:59:08 -0500
Subject: [gpfsug-discuss] Hello (Mcneil, Tony) (Jez Tucker (Chair))
Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE73664E3E8D@10.stjude.org>

>>Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.

1. What procedure did you follow to configure ctdb/samba to work?  Was it hard?  Could you show us, if permitted?
2. Are you also controlling NFS via ctdb?
3. Are you managing multiple IP devices?  Eg: ethX0 for VLAN104 and ethX1 for VLAN103 (<- for fast 10GbE users).

We use SoNAS and v7000 for most NAS and they use ctdb.  Their ctdb results are overall 'ok', with a few bumps here or there. Not too many ctdb PMRs over the 3-4 years on SoNAS.  
We want to set up ctdb for a GPFS AFM cache that services GPSF data clients.  That cache writes to an AFM home (SoNAS).  This cache also uses Samba and NFS for lightweight (as in IO, though still important) file access on this cache.
It does not use ctdb, but I know it should.
I would love to learn how you set your environment up even if it may be a little (or a lot) different.


Thanks,
Bill Pappas -
Manager - Enterprise Storage Group
Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
262 Danny Thomas Place, Mail Stop 504
Memphis, TN 38105
bill.pappas at stjude.org
(901) 595-4549 office
www.stjude.org
Email disclaimer: http://www.stjude.org/emaildisclaimer

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
Sent: Tuesday, October 21, 2014 6:00 AM
To: gpfsug-discuss at gpfsug.org
Subject: gpfsug-discuss Digest, Vol 33, Issue 21

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at gpfsug.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at gpfsug.org

You can reach the person managing the list at
	gpfsug-discuss-owner at gpfsug.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Hello (Mcneil, Tony) (Jez Tucker (Chair))
   2. Re: Hello (Mcneil, Tony) (Robert Triendl)


----------------------------------------------------------------------

Message: 1
Date: Tue, 21 Oct 2014 11:42:10 +0100
From: "Jez Tucker (Chair)" <chair at gpfsug.org>
To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony)
Message-ID: <54463882.7070009 at gpfsug.org>
Content-Type: text/plain; charset=windows-1252; format=flowed

I noticed that v7000 Unified is using CTDB v3.3.
What magic version is that as it's not in the git tree.  Latest tagged is 2.5.4.
Is that a question for Amitay?

On 17/10/14 06:25, Mcneil, Tony wrote:
> Hi Bill,
>
> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>
> Regards
> Tony.
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
> Sent: 16 October 2014 14:50
> To: gpfsug-discuss at gpfsug.org
> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>
> Are you using ctdb?
>
> Thanks,
> Bill Pappas -
> Manager - Enterprise Storage Group
> Sr. Enterprise Network Storage Architect Information Sciences 
> Department / Enterprise Informatics Division St. Jude Children's 
> Research Hospital
> 262 Danny Thomas Place, Mail Stop 504
> Memphis, TN 38105
> bill.pappas at stjude.org
> (901) 595-4549 office
> www.stjude.org
> Email disclaimer: http://www.stjude.org/emaildisclaimer
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of 
> gpfsug-discuss-request at gpfsug.org
> Sent: Thursday, October 16, 2014 6:00 AM
> To: gpfsug-discuss at gpfsug.org
> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>
> Send gpfsug-discuss mailing list submissions to
> 	gpfsug-discuss at gpfsug.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> or, via email, send a message with subject or body 'help' to
> 	gpfsug-discuss-request at gpfsug.org
>
> You can reach the person managing the list at
> 	gpfsug-discuss-owner at gpfsug.org
>
> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>     1. Hello (Mcneil, Tony)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 15 Oct 2014 14:01:49 +0100
> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
> Subject: [gpfsug-discuss] Hello
> Message-ID:
> 	
> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.u
> k>
> 	
> Content-Type: text/plain; charset="us-ascii"
>
> Hello All,
>
> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>
> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>
> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>
> So far we have migrated all our students and approximately 60% of our staff.
>
> Looking forward to receiving some interesting posts from the forum.
>
> Regards
> Tony.
>
> Tony McNeil
> Senior Systems Support Analyst,  Infrastructure,   Information Services
> ______________________________________________________________________
> ________
>
> T   Internal: 62852
> T   020 8417 2852
>
> Kingston University London
> Penrhyn Road, Kingston upon Thames KT1 2EE 
> www.kingston.ac.uk<http://www.kingston.ac.uk>
>
> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
> Please consider the environment before printing this email.
>
>
> This email has been scanned for all viruses by the MessageLabs Email Security System.
> -------------- next part -------------- An HTML attachment was 
> scrubbed...
> URL: 
> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528
> bcf/attachment-0001.html>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> End of gpfsug-discuss Digest, Vol 33, Issue 19
> **********************************************
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> This email has been scanned for all viruses by the MessageLabs Email 
> Security System.
>
> This email has been scanned for all viruses by the MessageLabs Email 
> Security System.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


------------------------------

Message: 2
Date: Tue, 21 Oct 2014 10:53:37 +0000
From: Robert Triendl <rtriendl at ddn.com>
To: "chair at gpfsug.org" <chair at gpfsug.org>, gpfsug main discussion list
	<gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony)
Message-ID: <F7513C48-28BE-48FB-B44F-3CCF4F305DFE at ddn.com>
Content-Type: text/plain; charset="Windows-1252"

Yes, I think so? I am;-)

On 2014/10/21, at 19:42, Jez Tucker (Chair) <chair at gpfsug.org> wrote:

> I noticed that v7000 Unified is using CTDB v3.3.
> What magic version is that as it's not in the git tree.  Latest tagged is 2.5.4.
> Is that a question for Amitay?
> 
> On 17/10/14 06:25, Mcneil, Tony wrote:
>> Hi Bill,
>> 
>> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>> 
>> Regards
>> Tony.
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org 
>> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
>> Sent: 16 October 2014 14:50
>> To: gpfsug-discuss at gpfsug.org
>> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>> 
>> Are you using ctdb?
>> 
>> Thanks,
>> Bill Pappas -
>> Manager - Enterprise Storage Group
>> Sr. Enterprise Network Storage Architect Information Sciences 
>> Department / Enterprise Informatics Division St. Jude Children's 
>> Research Hospital
>> 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 
>> bill.pappas at stjude.org
>> (901) 595-4549 office
>> www.stjude.org
>> Email disclaimer: http://www.stjude.org/emaildisclaimer
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org 
>> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of 
>> gpfsug-discuss-request at gpfsug.org
>> Sent: Thursday, October 16, 2014 6:00 AM
>> To: gpfsug-discuss at gpfsug.org
>> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>> 
>> Send gpfsug-discuss mailing list submissions to
>> 	gpfsug-discuss at gpfsug.org
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> or, via email, send a message with subject or body 'help' to
>> 	gpfsug-discuss-request at gpfsug.org
>> 
>> You can reach the person managing the list at
>> 	gpfsug-discuss-owner at gpfsug.org
>> 
>> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>> 
>> 
>> Today's Topics:
>> 
>>    1. Hello (Mcneil, Tony)
>> 
>> 
>> ---------------------------------------------------------------------
>> -
>> 
>> Message: 1
>> Date: Wed, 15 Oct 2014 14:01:49 +0100
>> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
>> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
>> Subject: [gpfsug-discuss] Hello
>> Message-ID:
>> 	
>> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.
>> uk>
>> 	
>> Content-Type: text/plain; charset="us-ascii"
>> 
>> Hello All,
>> 
>> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>> 
>> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>> 
>> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>> 
>> So far we have migrated all our students and approximately 60% of our staff.
>> 
>> Looking forward to receiving some interesting posts from the forum.
>> 
>> Regards
>> Tony.
>> 
>> Tony McNeil
>> Senior Systems Support Analyst,  Infrastructure,   Information Services
>> _____________________________________________________________________
>> _________
>> 
>> T   Internal: 62852
>> T   020 8417 2852
>> 
>> Kingston University London
>> Penrhyn Road, Kingston upon Thames KT1 2EE 
>> www.kingston.ac.uk<http://www.kingston.ac.uk>
>> 
>> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
>> Please consider the environment before printing this email.
>> 
>> 
>> This email has been scanned for all viruses by the MessageLabs Email Security System.
>> -------------- next part -------------- An HTML attachment was 
>> scrubbed...
>> URL: 
>> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f52
>> 8bcf/attachment-0001.html>
>> 
>> ------------------------------
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> End of gpfsug-discuss Digest, Vol 33, Issue 19
>> **********************************************
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> This email has been scanned for all viruses by the MessageLabs Email 
>> Security System.
>> 
>> This email has been scanned for all viruses by the MessageLabs Email 
>> Security System.
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 33, Issue 21
**********************************************


From bbanister at jumptrading.com  Thu Oct 23 19:35:45 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 23 Oct 2014 18:35:45 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>

I reviewed my RFE request again and notice that it has been marked as ?Private? and I think this is preventing people from voting on this RFE.  I have talked to others that would like to vote for this RFE.

How can I set the RFE to public so that others may vote on it?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister
Sent: Friday, October 10, 2014 12:13 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted.  I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable.  I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!)

Thanks,
-Bryan

Ps. Hajo said that he couldn?t access the RFE to vote on it:

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."
Cheers
Hajo

Here is what the RFE website states:
Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
A unique URL that you can bookmark and share with others.

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Friday, October 10, 2014 11:52 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
I agree with Ben, I think.

I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM

To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/2647953e/attachment.htm>

From bbanister at jumptrading.com  Thu Oct 23 19:50:21 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 23 Oct 2014 18:50:21 +0000
Subject: [gpfsug-discuss] GPFS User Group at SC14
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C68@CHI-EXCHANGEW2.w2k.jumptrading.com>

I'm going to be attending the GPFS User Group at SC14 this year.  Here is basic agenda that was provided:

GPFS/Elastic Storage User Group<http://www.ibm.com/marketing/campaigns/responses/servlet/IRSL?v=4&l=2&r=1552126&m=19222&p=t4AF1985E3251806FBB7C1E35C6B50F33B3C55757912C1492D293663A23F7665E328C51C1A1FF8D073BBA436369B63338&e=2>
Monday, November 17, 2014


3:00 PM-5:00 PM: GPFS/Elastic Storage User Group
[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

IBM Software Defined Storage strategy update

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Customer presentations

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Future directions such as object storage and OpenStack integration

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Elastic Storage server update

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Elastic Storage roadmap (*NDA required)


5:00 PM: Reception

Conference room location provided upon registration. *Attendees must sign a non-disclosure agreement upon arrival or as provided in advance.


I think it would be great to review the submitted RFEs and give the user group the chance to vote on them to help promote the RFEs that we care about most.

I would also really appreciate any additional details regarding the new GPFS 4.1 deadlock detection facility and any recommended best practices around this new feature.

Thanks!
-Bryan

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/c6ed46c9/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 76 bytes
Desc: image001.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/c6ed46c9/attachment.gif>

From chair at gpfsug.org  Thu Oct 23 19:52:07 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Thu, 23 Oct 2014 19:52:07 +0100
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>	<5437F562.1080609@psu.edu>	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <54494E57.90304@gpfsug.org>

Hi Bryan

   Unsure, to be honest.  When I added all the GPFS UG RFEs in, I didn't 
see an option to make the RFE private.
There's private fields, but not a 'make this RFE private' checkbox or such.

This one may be better directed to the GPFS developer forum / redo the RFE.

RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll 
be feeding info back to the group.

Jez

On 23/10/14 19:35, Bryan Banister wrote:
>
> I reviewed my RFE request again and notice that it has been marked as 
> ?Private? and I think this is preventing people from voting on this 
> RFE.  I have talked to others that would like to vote for this RFE.
>
> How can I set the RFE to public so that others may vote on it?
>
> Thanks!
>
> -Bryan
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Bryan Banister
> *Sent:* Friday, October 10, 2014 12:13 PM
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
> A DMAPI daemon solution puts a dependency on the DMAPI daemon for the 
> file system to be mounted.  I think it would be better to have 
> something like what I requested in the RFE that would hopefully not 
> have this dependency, and would be optional/configurable.  I?m sure we 
> would all prefer something that is supported directly by IBM (hence 
> the RFE!)
>
> Thanks,
>
> -Bryan
>
> Ps. Hajo said that he couldn?t access the RFE to vote on it:
>
> I would like to support the RFE but i get:
>
> "You cannot access this page because you do not have the proper 
> authority."
>
> Cheers
>
> Hajo
>
> Here is what the RFE website states:
>
> Bookmarkable 
> URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
> A unique URL that you can bookmark and share with others.
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org> 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Sven Oehme
> *Sent:* Friday, October 10, 2014 11:52 AM
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
> The only DMAPI agent i am aware of is a prototype that was written by 
> tridge in 2008 to demonstrate a file based HSM system for GPFS.
>
> its a working prototype, at least it worked in 2008 :-)
>
> you can get the source code from git :
>
> http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary
>
> just to be clear, there is no Support for this code. we obviously 
> Support the DMAPI interface , but the code that exposes the API is 
> nothing we provide Support for.
>
> thx. Sven
>
> On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister 
> <bbanister at jumptrading.com <mailto:bbanister at jumptrading.com>> wrote:
>
> I agree with Ben, I think.
>
> I don?t want to use the ILM policy engine as that puts a direct 
> workload against the metadata storage and server resources.  We need 
> something out-of-band, out of the file system operational path.
>
> Is there a simple DMAPI daemon that would log the file system 
> namespace changes that we could use?
>
> If so are there any limitations?
>
> And is it possible to set this up in an HA environment?
>
> Thanks!
>
> -Bryan
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org> 
> [mailto:gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org>] *On Behalf Of *Ben De Luca
> *Sent:* Friday, October 10, 2014 11:10 AM
>
>
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
> querying this through the policy engine is far to late to do any thing 
> useful with it
>
> On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com 
> <mailto:oehmes at gmail.com>> wrote:
>
> Ben,
>
> to get lists of 'Hot Files' turn File Heat on , some discussion about 
> it is here : 
> https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653
>
> thx.  Sven
>
> On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com 
> <mailto:bdeluca at gmail.com>> wrote:
>
> Id like this to see hot files
>
> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister 
> <bbanister at jumptrading.com <mailto:bbanister at jumptrading.com>> wrote:
>
> Hmm... I didn't think to use the DMAPI interface.  That could be a 
> nice option.  Has anybody done this already and are there any examples 
> we could look at?
>
> Thanks!
> -Bryan
>
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org> 
> [mailto:gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
> Sent: Friday, October 10, 2014 10:04 AM
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>
> On 10/9/14 3:31 PM, Bryan Banister wrote:
> >
> > Just wanted to pass my GPFS RFE along:
> >
> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> > 0458
> >
> >
> > *Description*:
> >
> > GPFS File System Manager should provide the option to log all file and
> > directory operations that occur in a file system, preferably stored in
> > a TSD (Time Series Database) that could be quickly queried through an
> > API interface and command line tools.  ...
> >
>
> The rudimentaries for this already exist via the DMAPI interface in 
> GPFS (used by the TSM HSM product). A while ago this was posted to the 
> IBM GPFS DeveloperWorks forum:
>
> On 1/3/11 10:27 AM, dWForums wrote:
> > Author:
> > AlokK.Dhir
> >
> > Message:
> > We have a proof of concept which uses DMAPI to listens to and 
> passively logs filesystem changes with a non blocking listener.  This 
> log can be used to generate backup sets etc. Unfortunately, a bug in 
> the current DMAPI keeps this approach from working in the case of 
> certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly 
> share the code once it is working.
>
> -Phil
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ------------------------------------------------------------------------
>
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ------------------------------------------------------------------------
>
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> ------------------------------------------------------------------------
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/c2f15d0b/attachment.htm>

From bbanister at jumptrading.com  Thu Oct 23 19:59:52 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 23 Oct 2014 18:59:52 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <54494E57.90304@gpfsug.org>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<54494E57.90304@gpfsug.org>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C98@CHI-EXCHANGEW2.w2k.jumptrading.com>

Looks like IBM decides if the RFE is public or private:

Q: What are private requests?
A: Private requests are requests that can be viewed only by IBM, the request author, members of a group with the request in its watchlist, and users with the request in their watchlist. Only the author of the request can add a private request to their watchlist or a group watchlist. Private requests appear in various public views, such as Top 20 watched or Planned requests; however, only limited information about the request will be displayed.
IBM determines the default request visibility of a request, either public or private, and IBM may change the request visibility at any time. If you are watching a request and have subscribed to email notifications, you will be notified if the visibility of the request changes.

I'm submitting a request to make the RFE public so that others may vote on it now,
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jez Tucker (Chair)
Sent: Thursday, October 23, 2014 1:52 PM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

Hi Bryan

  Unsure, to be honest.  When I added all the GPFS UG RFEs in, I didn't see an option to make the RFE private.
There's private fields, but not a 'make this RFE private' checkbox or such.

This one may be better directed to the GPFS developer forum / redo the RFE.

RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll be feeding info back to the group.

Jez
On 23/10/14 19:35, Bryan Banister wrote:
I reviewed my RFE request again and notice that it has been marked as "Private" and I think this is preventing people from voting on this RFE.  I have talked to others that would like to vote for this RFE.

How can I set the RFE to public so that others may vote on it?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister
Sent: Friday, October 10, 2014 12:13 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted.  I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable.  I'm sure we would all prefer something that is supported directly by IBM (hence the RFE!)

Thanks,
-Bryan

Ps. Hajo said that he couldn't access the RFE to vote on it:

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."
Cheers
Hajo

Here is what the RFE website states:
Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
A unique URL that you can bookmark and share with others.

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Friday, October 10, 2014 11:52 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
I agree with Ben, I think.

I don't want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM

To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/e4dbfbd9/attachment.htm>

From bbanister at jumptrading.com  Fri Oct 24 19:58:07 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 24 Oct 2014 18:58:07 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <OF63B850E2.11E81404-ON65257D6A.005322A1-65257D6A.00544690@in.ibm.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OF63B850E2.11E81404-ON65257D6A.005322A1-65257D6A.00544690@in.ibm.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB94C513@CHI-EXCHANGEW2.w2k.jumptrading.com>

It is with humble apology and great relief that I was wrong about the AFM limitation that I believed existed in the configuration I explained below.


The problem that I had with my configuration is that the NSD client cluster was not completely updated to GPFS 4.1.0-3, as there are a few nodes still running 3.5.0-20 in the cluster which currently prevents upgrading the GPFS file system release version (e.g. mmchconfig release=LATEST) to 4.1.0-3.  This GPFS configuration ?requirement? isn?t documented in the Advanced Admin Guide, but it makes sense that this is required since only the GPFS 4.1 release supports the GPFS protocol for AFM fileset targets.


I have tested the configuration with a new NSD Client cluster and the configuration works as desired.


Thanks Kalyan and others for their feedback.  Our file system namespace is unfortunately filled with small files that do not allow AFM to parallelize the data transfers across multiple nodes.  And unfortunately AFM will only allow one Gateway node per fileset to perform the prefetch namespace scan operation, which is incredibly slow as I stated before.  We were only seeing roughly 100 x " Queue numExec" operations per second.  I think this performance is gated by the directory namespace scan of the single gateway node.


Thanks!

-Bryan


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda
Sent: Tuesday, October 07, 2014 10:21 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations


some clarifications inline:


Regards

Kalyan

GPFS Development

EGL D Block, Bangalore


From:    Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>

To:          gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>

Date:     10/07/2014 08:12 PM

Subject:               Re: [gpfsug-discuss] AFM limitations in a multi-cluster

            environment, slow prefetch operations

Sent by:               gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>


Interesting that AFM is supposed to work in a multi-cluster environment.

We were using GPFS on the backend.  The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard

multi-cluster mount.   The "gateway" nodes in the new cluster mounted the

old file system.  All systems were connected over the same QDR IB fabric.

The client compute nodes in the third cluster mounted both the old and new file systems.  I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed.  I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable.  There were no error messages in any GPFS or system logs related to the `ls` fail.  NFS clients accessing cNFS servers in the new cluster also worked as expected.  The `ls` from the NFS client in an AFM fileset returned the expected directory listing.  Thus all symptoms indicated the configuration wasn't supported.  I may try to replicate the problem in a test environment at some point.


However AFM isn't really a great solution for file data migration between file systems for these reasons:

1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data).  No way to have it simply keep the two namespaces in sync.  And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system.  This is concerning and it is not easy to detect that this event has occurred.


--> The LU mode is meant for scenarios where changes in cache are not

--> meant

to be pushed back to old filesystem.  If thats not whats desired then other AFM modes like IW can be used to keep namespace in sync and data can flow from both sides.  Typically, for data migration --metadata-only to pull in the full namespace first and data can be migrated on demand or via policy as outlined above using prefetch cmd.  AFM setup should be extension to GPFS multi-cluster setup when using GPFS backend.


2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME!  And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES!

--> I am not sure I follow the first downtime.  If applications have to

start using the new filesystem, then they have to be informed accordingly.

If this can be done without bringing down applications, then there is no DOWNTIME.

Regarding, second downtime, you are right, disabling AFM after data migration requires unlink and hence downtime.  But there is a easy workaround, where revalidation intervals can be increased to max or GW nodes can be unconfigured without downtime with same effect.  And disabling AFM can be done at a later point during maintenance window.  We plan to modify this to have this done online aka without requiring unlink of the fileset.  This will get prioritized if there is enough interest in AFM being used in this direction.


3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration.  Multiple threads from a single node just doesn't cut it due to single node bandwidth limits.  When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second.  The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list!

--> Prefetch can run on multiple nodes by configuring multiple GW nodes

--> and

enabling parallel i/o as specified in the docs..link provided below.

Infact it can parallelize data xfer to a single file and also do multiple files in parallel depending on filesizes and various tuning params.


4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration.  Yes, there is of course efforts to breakup the namespace for each rsync operations.  This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired.


--> AFM can be used for data migration without any downtime dictated by

--> AFM

(see above) and it can infact use multiple threads on multiple nodes to do parallel i/o.


AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN.


Thanks,

-Bryan


-----Original Message-----

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda

Sent: Tuesday, October 07, 2014 12:03 AM

To: gpfsug main discussion list

Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations


Hi Bryan,

AFM supports GPFS multi-cluster..and we have customers already using this successfully.  Are you using GPFS backend?

Can you explain your configuration in detail and if ls is hung it would have generated some long waiters.  Maybe this should be pursued separately via PMR.  You can ping me the details directly if needed along with opening a PMR per IBM service process.


As for as prefetch is concerned, right now its limited to  one prefetch job per fileset.  Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration.

"afmNumFlushThreads" tunable controls the number of threads used by AFM.

This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.)


eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed.


List the change:

mmlsfileset fs1 prefetchIW --afm -L

Filesets in file system 'fs1':


Attributes for fileset prefetchIW:

===================================

Status                                  Linked

Path                                    /gpfs/fs1/prefetchIW

Id                                      36

afm-associated                          Yes

Target

nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch

Mode                                    independent-writer

File Lookup Refresh Interval            30 (default)

File Open Refresh Interval              30 (default)

Dir Lookup Refresh Interval             60 (default)

Dir Open Refresh Interval               60 (default)

Async Delay                             15 (default)

Last pSnapId                            0

Display Home Snapshots                  no

Number of Gateway Flush Threads         5

Prefetch Threshold                      0 (default)

Eviction Enabled                        yes (default)


AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm


and this link outlines tuning params for parallel i/o along with others:

http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning


Regards

Kalyan

GPFS Development

EGL D Block, Bangalore


From:   Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>

To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>

Date:   10/06/2014 09:57 PM

Subject:        Re: [gpfsug-discuss] AFM limitations in a multi-cluster

            environment, slow prefetch operations

Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>


We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan


From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme

Sent: Monday, October 06, 2014 11:28 AM

To: gpfsug main discussion list

Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations


Hi Bryan,


in 4.1 AFM uses multiple threads for reading data, this was different in

3.5 . what version are you using ?


thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>

wrote:

Just an FYI to the GPFS user community,


We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.


Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:

GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching:

v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset).


We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes.

However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.


Cheers,

-Bryan


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141024/ff1481a0/attachment.htm>

From chair at gpfsug.org  Wed Oct 29 13:59:40 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Wed, 29 Oct 2014 13:59:40 +0000
Subject: [gpfsug-discuss] Storagebeers, Nov 13th
Message-ID: <5450F2CC.3070302@gpfsug.org>

Hello all,

   I just thought I'd make you all aware of a social, #storagebeers on 
Nov 13th organised by Martin Glassborow, one of our UG members.

http://www.gpfsug.org/2014/10/29/storagebeers-13th-nov/

I'll be popping along.  Hopefully see you there.

Jez


From Jared.Baker at uwyo.edu  Wed Oct 29 15:31:31 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 15:31:31 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
Message-ID: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/5e7d4cd0/attachment.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 16:33:22 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 16:33:22 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <1414600402.24518.216.camel@buzzard.phy.strath.ac.uk>

On Wed, 2014-10-29 at 15:31 +0000, Jared David Baker wrote:

[SNIP]

> I?m wondering if somebody has seen this type of issue before? Will
> recreating my NSDs destroy the filesystem? I?m thinking that all the
> data is intact, but there is no crucial data on this file system yet,
> so I could recreate the file system, but I would like to learn how to
> solve a problem like this. Thanks for all help and information.
> 

At an educated guess and assuming the disks are visible to the OS (try
dd'ing the first few GB to /dev/null) it looks like you have managed at
some point to wipe the NSD descriptors from the disks - ouch.

The file system will continue to work after this has been done, but if
you start rebooting the NSD servers you will find after the last one has
been restarted the file system is unmountable. Simply unmounting the
file systems from each NDS server is also probably enough. For good
measure unless you have a backup of the NSD descriptors somewhere it is
also an unrecoverable condition.

Lucky for you if there is nothing on it that matters.

My suggestion is re-examine what you did during the firmware upgrade, as
that is the most likely culprit. However bear in mind that it could have
been days or even weeks ago that it occurred.

I would raise a PMR to be sure, but it looks to me like you will be
recreating the file system from scratch.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From oehmes at gmail.com  Wed Oct 29 16:42:26 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Wed, 29 Oct 2014 09:42:26 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>

Hello,

there are multiple reasons why the descriptors can not be found .

there was a recent change in firmware behaviors on multiple servers that
restore the GPT table from a disk if the disk was used as a OS disk before
used as GPFS disks.  some infos here :
https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e

if thats the case there is a procedure to restore them.

it could also be something very trivial , e.g. that your multipath mapping
changed and your nsddevice file actually just prints out devices instead of
scanning them and create a list on the fly , so GPFS ignores the new path
to the disks.
in any case , opening a PMR and work with Support is the best thing to do
before causing any more damage.
if the file-system is still mounted don't unmount it under any
circumstances as Support needs to extract NSD descriptor information from
it to restore them easily.

Sven


On Wed, Oct 29, 2014 at 8:31 AM, Jared David Baker <Jared.Baker at uwyo.edu>
wrote:

>  Hello all,
>
>
>
> I?m hoping that somebody can shed some light on a problem that I
> experienced yesterday. I?ve been working with GPFS for a couple months as
> an admin now, but I?ve come across a problem that I?m unable to see the
> answer to. Hopefully the solution is not listed somewhere blatantly on the
> web, but I spent a fair amount of time looking last night. Here is the
> situation: yesterday, I needed to update some firmware on a Mellanox HCA
> FDR14 card and reboot one of our GPFS servers and repeat for the sister
> node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However,
> upon reboot, the server seemed to lose the path mappings to the multipath
> devices for the NSDs. Output below:
>
>
>
> --
>
> [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch
>
>
>
> Disk name    NSD volume ID      Device         Node name
> Remarks
>
>
> ---------------------------------------------------------------------------------------
>
> dcs3800u31a_lun0 0A62001B54235577   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun0 0A62001B54235577   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini
>        (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -
> mminsd6.infini           (not found) server node
>
>
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> Also, the system was working fantastically before the reboot, but now I?m
> unable to mount the GPFS filesystem. The disk names look like they are
> there and mapped to the NSD volume ID, but there is no Device. I?ve created
> the /var/mmfs/etc/nsddevices script and it has the following output with
> user return 0:
>
>
>
> --
>
> [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
>
> mapper/dcs3800u31a_lun0 dmm
>
> mapper/dcs3800u31a_lun10 dmm
>
> mapper/dcs3800u31a_lun2 dmm
>
> mapper/dcs3800u31a_lun4 dmm
>
> mapper/dcs3800u31a_lun6 dmm
>
> mapper/dcs3800u31a_lun8 dmm
>
> mapper/dcs3800u31b_lun1 dmm
>
> mapper/dcs3800u31b_lun11 dmm
>
> mapper/dcs3800u31b_lun3 dmm
>
> mapper/dcs3800u31b_lun5 dmm
>
> mapper/dcs3800u31b_lun7 dmm
>
> mapper/dcs3800u31b_lun9 dmm
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> That output looks correct to me based on the documentation. So I went
> digging in the GPFS log file and found this relevant information:
>
>
>
> --
>
> Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails.
> No such NSD locally found.
>
> Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails.
> No such NSD locally found.
>
> --
>
>
>
> Okay, so the NSDs don?t seem to be able to be found, so I attempt to
> rediscover the NSD by executing the command mmnsddiscover:
>
>
>
> --
>
> [root at mmmnsd5 ~]# mmnsddiscover
>
> mmnsddiscover:  Attempting to rediscover the disks.  This may take a while
> ...
>
> mmnsddiscover:  Finished.
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> I was hoping that finished, but then upon restarting GPFS, there was no
> success. Verifying with mmlsnsd -X -f gscratch
>
>
>
> --
>
> [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch
>
>
>
> Disk name    NSD volume ID      Device         Devtype  Node
> name                Remarks
>
>
> ---------------------------------------------------------------------------------------------------
>
> dcs3800u31a_lun0 0A62001B54235577   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun0 0A62001B54235577   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -              -
>    mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -              -
> mminsd6.infini           (not found) server node
>
>
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> I?m wondering if somebody has seen this type of issue before? Will
> recreating my NSDs destroy the filesystem? I?m thinking that all the data
> is intact, but there is no crucial data on this file system yet, so I could
> recreate the file system, but I would like to learn how to solve a problem
> like this. Thanks for all help and information.
>
>
>
> Regards,
>
>
>
> Jared
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/be381140/attachment.htm>

From oester at gmail.com  Wed Oct 29 16:46:35 2014
From: oester at gmail.com (Bob Oesterlin)
Date: Wed, 29 Oct 2014 11:46:35 -0500
Subject: [gpfsug-discuss] GPFS 4.1 event "deadlockOverload"
Message-ID: <CAMNdFvA9XpNVGGM9=BefO9x6rjkm9tGoD6jeG759H1Whd=4f9w@mail.gmail.com>

I posted this to developerworks, but haven't seen a response. This is NOT
the same event "deadlockDetected" that is documented in the 4.1 Probelm
Determination Guide.

I see these errors -in my mmfslog on the cluster master. I just upgraded to
4.1, and I can't find this documented anywhere. What is "event
deadlockOverload" ? And what script would it call?


The nodes in question are part of a CNFS group.


Mon Oct 27 10:11:08.848 2014: [I] Received overload notification request
from 10.30.42.30 to forward to all nodes in cluster XXX
Mon Oct 27 10:11:08.849 2014: [I] Calling User Exit Script
gpfsNotifyOverload: event deadlockOverload, Async command
/usr/lpp/mmfs/bin/mmcommon.
Mon Oct 27 10:11:14.478 2014: [I] Received overload notification request
from 10.30.42.26 to forward to all nodes in cluster XXX
Mon Oct 27 10:11:58.869 2014: [I] Received overload notification request
from 10.30.42.30 to forward to all nodes in cluster XXX
Mon Oct 27 10:11:58.870 2014: [I] Calling User Exit Script
gpfsNotifyOverload: event deadlockOverload, Async command
/usr/lpp/mmfs/bin/mmcommon.

Bob Oesterlin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/40209cd2/attachment.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 17:19:14 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 17:19:14 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
Message-ID: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>

On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> Hello,
> 
> 
> there are multiple reasons why the descriptors can not be found .
> 
> 
> there was a recent change in firmware behaviors on multiple servers
> that restore the GPT table from a disk if the disk was used as a OS
> disk before used as GPFS disks.  some infos
> here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e
> 
> 
> if thats the case there is a procedure to restore them.

I have been categorically told by IBM in no uncertain terms if the NSD
descriptors have *ALL* been wiped then it is game over for that file
system; restore from backup is your only option.

If the GPT table has been "restored" and overwritten the NSD descriptors
then you are hosed.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From oehmes at gmail.com  Wed Oct 29 17:22:30 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Wed, 29 Oct 2014 10:22:30 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
Message-ID: <CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>

if you still have a running system you can extract the information and
recreate the descriptors.
if your sytem is already down, this is not possible any more.

which is why i suggested to open a PMR as the Support team will be able to
provide the right guidance and help .

Sven

On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard <jonathan at buzzard.me.uk>
wrote:

> On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> > Hello,
> >
> >
> > there are multiple reasons why the descriptors can not be found .
> >
> >
> > there was a recent change in firmware behaviors on multiple servers
> > that restore the GPT table from a disk if the disk was used as a OS
> > disk before used as GPFS disks.  some infos
> > here :
> https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e
> >
> >
> > if thats the case there is a procedure to restore them.
>
> I have been categorically told by IBM in no uncertain terms if the NSD
> descriptors have *ALL* been wiped then it is game over for that file
> system; restore from backup is your only option.
>
> If the GPT table has been "restored" and overwritten the NSD descriptors
> then you are hosed.
>
> JAB.
>
> --
> Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
> Fife, United Kingdom.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/98e54436/attachment.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 17:29:09 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 17:29:09 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
Message-ID: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk>

On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote:
> if you still have a running system you can extract the information and
> recreate the descriptors. 

We had a running system with the file system still mounted on some nodes
but all the NSD descriptors wiped, and I repeat where categorically told
by IBM that nothing could be done and to restore the file system from
backup.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From Jared.Baker at uwyo.edu  Wed Oct 29 17:30:00 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 17:30:00 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
Message-ID: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com>

Thanks for all the information. I?m not exactly sure what happened during the firmware update of the HCAs (another admin). But I do have all the stanza files that I used to create the NSDs. Possible to utilize them to just regenerate the NSDs or is it consensus that the FS is gone? As the system was not in production (yet) I?ve got no problem delaying the release and running some tests to verify possible fixes. The system was already unmounted, so it is a completely inactive FS across the cluster.

Thanks,

Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 11:23 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

if you still have a running system you can extract the information and recreate the descriptors.
if your sytem is already down, this is not possible any more.

which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help .

Sven

On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard <jonathan at buzzard.me.uk<mailto:jonathan at buzzard.me.uk>> wrote:
On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> Hello,
>
>
> there are multiple reasons why the descriptors can not be found .
>
>
> there was a recent change in firmware behaviors on multiple servers
> that restore the GPT table from a disk if the disk was used as a OS
> disk before used as GPFS disks.  some infos
> here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e
>
>
> if thats the case there is a procedure to restore them.

I have been categorically told by IBM in no uncertain terms if the NSD
descriptors have *ALL* been wiped then it is game over for that file
system; restore from backup is your only option.

If the GPT table has been "restored" and overwritten the NSD descriptors
then you are hosed.

JAB.

--
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk<http://buzzard.me.uk>
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/75266e11/attachment.htm>

From oehmes at us.ibm.com  Wed Oct 29 17:45:38 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 10:45:38 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
	<4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF4F2A0B39.CDE2633C-ON88257D80.0060D847-88257D80.00618FB5@us.ibm.com>

Jared,

if time permits i would open a PMR to check what happened. as i stated in 
my first email it could be multiple things, the GPT restore is only one 
possible of many explanations and some more simple reasons could explain 
what you see as well. get somebody from support check the state and then 
we know for sure. it would give you also peace of mind that it doesn't 
happen again when you are in production.
if you feel its not worth and you don't wipe any important information 
start over again.

btw. the newer BIOS versions of IBM servers have a option from preventing 
the GPT issue from happening : 

[root at gss02n1 ~]# asu64 showvalues DiskGPTRecovery.DiskGPTRecovery
IBM Advanced Settings Utility version 9.61.85B
Licensed Materials - Property of IBM
(C) Copyright IBM Corp. 2007-2014 All Rights Reserved
IMM LAN-over-USB device 0 enabled successfully.
Successfully discovered the IMM via SLP.
Discovered IMM at IP address 169.254.95.118
Connected to IMM at IP address 169.254.95.118
DiskGPTRecovery.DiskGPTRecovery=None=<Automatic>

if you set it the GPT will never get restored. you would have to set this 
on all the nodes that have access to the disks.

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 10:30 AM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Thanks for all the information. I?m not exactly sure what happened during 
the firmware update of the HCAs (another admin). But I do have all the 
stanza files that I used to create the NSDs. Possible to utilize them to 
just regenerate the NSDs or is it consensus that the FS is gone? As the 
system was not in production (yet) I?ve got no problem delaying the 
release and running some tests to verify possible fixes. The system was 
already unmounted, so it is a completely inactive FS across the cluster.
 
Thanks,
 
Jared
 
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 11:23 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings
 
if you still have a running system you can extract the information and 
recreate the descriptors. 
if your sytem is already down, this is not possible any more. 
 
which is why i suggested to open a PMR as the Support team will be able to 
provide the right guidance and help . 
 
Sven
 
On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard <jonathan at buzzard.me.uk
> wrote:
On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> Hello,
>
>
> there are multiple reasons why the descriptors can not be found .
>
>
> there was a recent change in firmware behaviors on multiple servers
> that restore the GPT table from a disk if the disk was used as a OS
> disk before used as GPFS disks.  some infos
> here : 
https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e

>
>
> if thats the case there is a procedure to restore them.

I have been categorically told by IBM in no uncertain terms if the NSD
descriptors have *ALL* been wiped then it is game over for that file
system; restore from backup is your only option.

If the GPT table has been "restored" and overwritten the NSD descriptors
then you are hosed.

JAB.

--
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/3f10207c/attachment.htm>

From ewahl at osc.edu  Wed Oct 29 18:57:28 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Wed, 29 Oct 2014 18:57:28 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>,
	<1414603749.24518.227.camel@buzzard.phy.strath.ac.uk>
Message-ID: <C59E5201836F7147BAD35189FFBB35D10116515C18@USOAPP09V04P.si.lan>

SOBAR is your friend at that point?

Ed Wahl
OSC

 
________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jonathan Buzzard [jonathan at buzzard.me.uk]
Sent: Wednesday, October 29, 2014 1:29 PM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote:
> if you still have a running system you can extract the information and
> recreate the descriptors.

We had a running system with the file system still mounted on some nodes
but all the NSD descriptors wiped, and I repeat where categorically told
by IBM that nothing could be done and to restore the file system from
backup.

JAB.

--
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From ewahl at osc.edu  Wed Oct 29 19:07:34 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Wed, 29 Oct 2014 19:07:34 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>

  Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.  
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I?m hoping that somebody can shed some light on a problem that I experienced yesterday. I?ve been working with GPFS for a couple months as an admin now, but I?ve come across a problem that I?m unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I?m unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I?ve created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don?t seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I?m wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I?m thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared


From Jared.Baker at uwyo.edu  Wed Oct 29 19:27:26 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 19:27:26 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
Message-ID: <e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>

Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

   8       48 31251951616 sdd
   8       32 31251951616 sdc
   8       80 31251951616 sdf
   8       16 31251951616 sdb
   8      128 31251951616 sdi
   8      112 31251951616 sdh
   8       96 31251951616 sdg
   8      192 31251951616 sdm
   8      240 31251951616 sdp
   8      208 31251951616 sdn
   8      144 31251951616 sdj
   8       64 31251951616 sde
   8      224 31251951616 sdo
   8      160 31251951616 sdk
   8      176 31251951616 sdl
  65        0 31251951616 sdq
  65       48 31251951616 sdt
  65       16 31251951616 sdr
  65      128  584960000 sdy
  65       80 31251951616 sdv
  65       96 31251951616 sdw
  65       64 31251951616 sdu
  65      112 31251951616 sdx
  65       32 31251951616 sds
   8        0 31251951616 sda
 253        0 31251951616 dm-0
 253        1 31251951616 dm-1
 253        2 31251951616 dm-2
 253        3 31251951616 dm-3
 253        4 31251951616 dm-4
 253        5 31251951616 dm-5
 253        6 31251951616 dm-6
 253        7 31251951616 dm-7
 253        8 31251951616 dm-8
 253        9 31251951616 dm-9
 253       10 31251951616 dm-10
 253       11 31251951616 dm-11
 253       12  584960000 dm-12
 253       13     524288 dm-13
 253       14   16777216 dm-14
 253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

  Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.  
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oehmes at us.ibm.com  Wed Oct 29 19:41:22 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 12:41:22 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>

can you please post the content of your nsddevices script ? 

also please run 

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is 
all SAS attached. Two servers which can see the multipath LUNS for 
failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

   8       48 31251951616 sdd
   8       32 31251951616 sdc
   8       80 31251951616 sdf
   8       16 31251951616 sdb
   8      128 31251951616 sdi
   8      112 31251951616 sdh
   8       96 31251951616 sdg
   8      192 31251951616 sdm
   8      240 31251951616 sdp
   8      208 31251951616 sdn
   8      144 31251951616 sdj
   8       64 31251951616 sde
   8      224 31251951616 sdo
   8      160 31251951616 sdk
   8      176 31251951616 sdl
  65        0 31251951616 sdq
  65       48 31251951616 sdt
  65       16 31251951616 sdr
  65      128  584960000 sdy
  65       80 31251951616 sdv
  65       96 31251951616 sdw
  65       64 31251951616 sdu
  65      112 31251951616 sdx
  65       32 31251951616 sds
   8        0 31251951616 sda
 253        0 31251951616 dm-0
 253        1 31251951616 dm-1
 253        2 31251951616 dm-2
 253        3 31251951616 dm-3
 253        4 31251951616 dm-4
 253        5 31251951616 dm-5
 253        6 31251951616 dm-6
 253        7 31251951616 dm-7
 253        8 31251951616 dm-8
 253        9 31251951616 dm-9
 253       10 31251951616 dm-10
 253       11 31251951616 dm-11
 253       12  584960000 dm-12
 253       13     524288 dm-13
 253       14   16777216 dm-14
 253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

  Can you see the block devices from inside the OS after the reboot?  I 
don't see where you mention this.  How is the storage attached to the 
server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes 
share the storage?  All nsds in same failure group?     I was quickly 
brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly 
updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one 
day, dm-something else another), so that is no problem.  But wacking the 
volume label is a pain. 
When hardware dies if you have nsds sharing the same LUNs you can just 
transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org 
[gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker 
[Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I 
experienced yesterday. I've been working with GPFS for a couple months as 
an admin now, but I've come across a problem that I'm unable to see the 
answer to. Hopefully the solution is not listed somewhere blatantly on the 
web, but I spent a fair amount of time looking last night. Here is the 
situation: yesterday, I needed to update some firmware on a Mellanox HCA 
FDR14 card and reboot one of our GPFS servers and repeat for the sister 
node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, 
upon reboot, the server seemed to lose the path mappings to the multipath 
devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini   (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini   (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini   (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini   (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini  (not 
found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm 
unable to mount the GPFS filesystem. The disk names look like they are 
there and mapped to the NSD volume ID, but there is no Device. I've 
created the /var/mmfs/etc/nsddevices script and it has the following 
output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went 
digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. 
No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. 
No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to 
rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while 
...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no 
success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name  Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd5.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd5.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini 
          (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will 
recreating my NSDs destroy the filesystem? I'm thinking that all the data 
is intact, but there is no crucial data on this file system yet, so I 
could recreate the file system, but I would like to learn how to solve a 
problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/8b616a16/attachment.htm>

From Jared.Baker at uwyo.edu  Wed Oct 29 19:46:23 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 19:46:23 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
Message-ID: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>

Sven, output below:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--

Thanks, Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

can you please post the content of your nsddevices script ?

also please run

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
 `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

  8       48 31251951616 sdd
  8       32 31251951616 sdc
  8       80 31251951616 sdf
  8       16 31251951616 sdb
  8      128 31251951616 sdi
  8      112 31251951616 sdh
  8       96 31251951616 sdg
  8      192 31251951616 sdm
  8      240 31251951616 sdp
  8      208 31251951616 sdn
  8      144 31251951616 sdj
  8       64 31251951616 sde
  8      224 31251951616 sdo
  8      160 31251951616 sdk
  8      176 31251951616 sdl
 65        0 31251951616 sdq
 65       48 31251951616 sdt
 65       16 31251951616 sdr
 65      128  584960000 sdy
 65       80 31251951616 sdv
 65       96 31251951616 sdw
 65       64 31251951616 sdu
 65      112 31251951616 sdx
 65       32 31251951616 sds
  8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

 Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/636898cf/attachment.htm>

From oehmes at us.ibm.com  Wed Oct 29 20:02:53 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 13:02:53 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>

Hi,

i was asking for the content, not the result :-)

can you run cat /var/mmfs/etc/nsddevices

the 2nd output confirms at least that there is no correct label on the 
disk, as it returns EFI 

on a GNR system you get a few more infos , but at least you should see the 
NSD descriptor string like i get on my system : 


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings
T7$V
e2d2s08
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s

while i still would like to see the nsddevices script i assume your NSD 
descriptor is wiped and without a lot of manual labor and at least a 
recent GPFS dump this is very hard if at all to recreate.


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 12:46 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Sven, output below:
 
--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--
 
Thanks, Jared
 
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings
 
can you please post the content of your nsddevices script ? 

also please run   

dd if=/dev/dm-0 bs=1k count=32 |strings 

and post the output 

thx. Sven 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Jared David Baker <Jared.Baker at uwyo.edu> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/29/2014 12:27 PM 
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is 
all SAS attached. Two servers which can see the multipath LUNS for 
failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
 `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

  8       48 31251951616 sdd
  8       32 31251951616 sdc
  8       80 31251951616 sdf
  8       16 31251951616 sdb
  8      128 31251951616 sdi
  8      112 31251951616 sdh
  8       96 31251951616 sdg
  8      192 31251951616 sdm
  8      240 31251951616 sdp
  8      208 31251951616 sdn
  8      144 31251951616 sdj
  8       64 31251951616 sde
  8      224 31251951616 sdo
  8      160 31251951616 sdk
  8      176 31251951616 sdl
 65        0 31251951616 sdq
 65       48 31251951616 sdt
 65       16 31251951616 sdr
 65      128  584960000 sdy
 65       80 31251951616 sdv
 65       96 31251951616 sdw
 65       64 31251951616 sdu
 65      112 31251951616 sdx
 65       32 31251951616 sds
  8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

 Can you see the block devices from inside the OS after the reboot?  I 
don't see where you mention this.  How is the storage attached to the 
server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes 
share the storage?  All nsds in same failure group?     I was quickly 
brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly 
updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one 
day, dm-something else another), so that is no problem.  But wacking the 
volume label is a pain. 
When hardware dies if you have nsds sharing the same LUNs you can just 
transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org 
[gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker 
[Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I 
experienced yesterday. I've been working with GPFS for a couple months as 
an admin now, but I've come across a problem that I'm unable to see the 
answer to. Hopefully the solution is not listed somewhere blatantly on the 
web, but I spent a fair amount of time looking last night. Here is the 
situation: yesterday, I needed to update some firmware on a Mellanox HCA 
FDR14 card and reboot one of our GPFS servers and repeat for the sister 
node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, 
upon reboot, the server seemed to lose the path mappings to the multipath 
devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini   (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini   (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini   (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini   (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini  (not 
found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm 
unable to mount the GPFS filesystem. The disk names look like they are 
there and mapped to the NSD volume ID, but there is no Device. I've 
created the /var/mmfs/etc/nsddevices script and it has the following 
output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went 
digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. 
No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. 
No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to 
rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while 
...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no 
success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name  Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd5.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd5.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini 
          (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will 
recreating my NSDs destroy the filesystem? I'm thinking that all the data 
is intact, but there is no crucial data on this file system yet, so I 
could recreate the file system, but I would like to learn how to solve a 
problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/52ddf40d/attachment.htm>

From Jared.Baker at uwyo.edu  Wed Oct 29 20:13:06 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 20:13:06 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
Message-ID: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>

Apologies Sven, w/o comments below:

--
#!/bin/ksh
CONTROLLER_REGEX='[ab]_lun[0-9]+'
for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX )
do
   echo mapper/$dev dmm
   #echo mapper/$dev generic
done

# Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover),
return 0
--

Best, Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:03 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Hi,

i was asking for the content, not the result :-)

can you run cat /var/mmfs/etc/nsddevices

the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI

on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system :


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings
T7$V
e2d2s08
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s

while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate.


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:46 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Sven, output below:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--

Thanks, Jared

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

can you please post the content of your nsddevices script ?

also please run

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

 8       48 31251951616 sdd
 8       32 31251951616 sdc
 8       80 31251951616 sdf
 8       16 31251951616 sdb
 8      128 31251951616 sdi
 8      112 31251951616 sdh
 8       96 31251951616 sdg
 8      192 31251951616 sdm
 8      240 31251951616 sdp
 8      208 31251951616 sdn
 8      144 31251951616 sdj
 8       64 31251951616 sde
 8      224 31251951616 sdo
 8      160 31251951616 sdk
 8      176 31251951616 sdl
65        0 31251951616 sdq
65       48 31251951616 sdt
65       16 31251951616 sdr
65      128  584960000 sdy
65       80 31251951616 sdv
65       96 31251951616 sdw
65       64 31251951616 sdu
65      112 31251951616 sdx
65       32 31251951616 sds
 8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/fc35facb/attachment.htm>

From oehmes at us.ibm.com  Wed Oct 29 20:25:10 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 13:25:10 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>

Hi,

based on what i see is your BIOS or FW update wiped the NSD descriptor by 
restoring a GPT table on the start of a disk that shouldn't have a GPT 
table to begin with as its under control of GPFS.
future releases of GPFS prevent this by writing our own GPT label to the 
disks so other tools don't touch them, but that doesn't help in your case 
any more. if you want this officially confirmed i would still open a PMR, 
but at that point given that you don't seem to have any production data on 
it from what i see in your response you should recreate the filesystem.

Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 01:13 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Apologies Sven, w/o comments below:
 
--
#!/bin/ksh
CONTROLLER_REGEX='[ab]_lun[0-9]+'
for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX )
do
   echo mapper/$dev dmm
   #echo mapper/$dev generic
done
 
# Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover),
return 0
--
 
Best, Jared
 
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:03 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings
 
Hi, 

i was asking for the content, not the result :-) 

can you run cat /var/mmfs/etc/nsddevices 

the 2nd output confirms at least that there is no correct label on the 
disk, as it returns EFI 

on a GNR system you get a few more infos , but at least you should see the 
NSD descriptor string like i get on my system : 


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings 
T7$V 
e2d2s08 
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014 
32+0 records in 
32+0 records out 
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s 

while i still would like to see the nsddevices script i assume your NSD 
descriptor is wiped and without a lot of manual labor and at least a 
recent GPFS dump this is very hard if at all to recreate. 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Jared David Baker <Jared.Baker at uwyo.edu> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/29/2014 12:46 PM 
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


Sven, output below: 
  
-- 
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices 
mapper/dcs3800u31a_lun0 dmm 
mapper/dcs3800u31a_lun10 dmm 
mapper/dcs3800u31a_lun2 dmm 
mapper/dcs3800u31a_lun4 dmm 
mapper/dcs3800u31a_lun6 dmm 
mapper/dcs3800u31a_lun8 dmm 
mapper/dcs3800u31b_lun1 dmm 
mapper/dcs3800u31b_lun11 dmm 
mapper/dcs3800u31b_lun3 dmm 
mapper/dcs3800u31b_lun5 dmm 
mapper/dcs3800u31b_lun7 dmm 
mapper/dcs3800u31b_lun9 dmm 
[root at mmmnsd5 ~]# 
-- 
-- 
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 
32+0 records in 
32+0 records out 
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s 
EFI PART 
system 
[root at mmmnsd5 /]# 
-- 
  
Thanks, Jared 
  
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings 
  
can you please post the content of your nsddevices script ? 

also please run   

dd if=/dev/dm-0 bs=1k count=32 |strings 

and post the output 

thx. Sven 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Jared David Baker <Jared.Baker at uwyo.edu> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/29/2014 12:27 PM 
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is 
all SAS attached. Two servers which can see the multipath LUNS for 
failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

 8       48 31251951616 sdd
 8       32 31251951616 sdc
 8       80 31251951616 sdf
 8       16 31251951616 sdb
 8      128 31251951616 sdi
 8      112 31251951616 sdh
 8       96 31251951616 sdg
 8      192 31251951616 sdm
 8      240 31251951616 sdp
 8      208 31251951616 sdn
 8      144 31251951616 sdj
 8       64 31251951616 sde
 8      224 31251951616 sdo
 8      160 31251951616 sdk
 8      176 31251951616 sdl
65        0 31251951616 sdq
65       48 31251951616 sdt
65       16 31251951616 sdr
65      128  584960000 sdy
65       80 31251951616 sdv
65       96 31251951616 sdw
65       64 31251951616 sdu
65      112 31251951616 sdx
65       32 31251951616 sds
 8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Can you see the block devices from inside the OS after the reboot?  I 
don't see where you mention this.  How is the storage attached to the 
server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes 
share the storage?  All nsds in same failure group?     I was quickly 
brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly 
updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one 
day, dm-something else another), so that is no problem.  But wacking the 
volume label is a pain. 
When hardware dies if you have nsds sharing the same LUNs you can just 
transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org 
[gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker 
[Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I 
experienced yesterday. I've been working with GPFS for a couple months as 
an admin now, but I've come across a problem that I'm unable to see the 
answer to. Hopefully the solution is not listed somewhere blatantly on the 
web, but I spent a fair amount of time looking last night. Here is the 
situation: yesterday, I needed to update some firmware on a Mellanox HCA 
FDR14 card and reboot one of our GPFS servers and repeat for the sister 
node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, 
upon reboot, the server seemed to lose the path mappings to the multipath 
devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini   (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini   (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini   (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini   (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini  (not 
found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm 
unable to mount the GPFS filesystem. The disk names look like they are 
there and mapped to the NSD volume ID, but there is no Device. I've 
created the /var/mmfs/etc/nsddevices script and it has the following 
output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went 
digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. 
No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. 
No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to 
rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while 
...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no 
success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name  Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd5.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd5.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini 
          (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will 
recreating my NSDs destroy the filesystem? I'm thinking that all the data 
is intact, but there is no crucial data on this file system yet, so I 
could recreate the file system, but I would like to learn how to solve a 
problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/70ca7229/attachment.htm>

From Jared.Baker at uwyo.edu  Wed Oct 29 20:30:29 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 20:30:29 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
Message-ID: <fdd5ef1e6e4d4444a49655c2f28d2f09@CY1PR0501MB1259.namprd05.prod.outlook.com>

Thanks Sven, I appreciate the feedback. I'll be opening the PMR soon.

Again, thanks for the information.

Best, Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:25 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Hi,

based on what i see is your BIOS or FW update wiped the NSD descriptor by restoring a GPT table on the start of a disk that shouldn't have a GPT table to begin with as its under control of GPFS.
future releases of GPFS prevent this by writing our own GPT label to the disks so other tools don't touch them, but that doesn't help in your case any more. if you want this officially confirmed i would still open a PMR, but at that point given that you don't seem to have any production data on it from what i see in your response you should recreate the filesystem.

Sven

------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 01:13 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Apologies Sven, w/o comments below:

--
#!/bin/ksh
CONTROLLER_REGEX='[ab]_lun[0-9]+'
for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX )
do
   echo mapper/$dev dmm
   #echo mapper/$dev generic
done

# Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover),
return 0
--

Best, Jared

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:03 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Hi,

i was asking for the content, not the result :-)

can you run cat /var/mmfs/etc/nsddevices

the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI

on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system :


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings
T7$V
e2d2s08
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s

while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate.


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:46 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Sven, output below:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--

Thanks, Jared

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

can you please post the content of your nsddevices script ?

also please run

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

8       48 31251951616 sdd
8       32 31251951616 sdc
8       80 31251951616 sdf
8       16 31251951616 sdb
8      128 31251951616 sdi
8      112 31251951616 sdh
8       96 31251951616 sdg
8      192 31251951616 sdm
8      240 31251951616 sdp
8      208 31251951616 sdn
8      144 31251951616 sdj
8       64 31251951616 sde
8      224 31251951616 sdo
8      160 31251951616 sdk
8      176 31251951616 sdl
65        0 31251951616 sdq
65       48 31251951616 sdt
65       16 31251951616 sdr
65      128  584960000 sdy
65       80 31251951616 sdv
65       96 31251951616 sdw
65       64 31251951616 sdu
65      112 31251951616 sdx
65       32 31251951616 sds
8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/c07e505f/attachment.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 20:32:25 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 20:32:25 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
Message-ID: <54514ED9.9030604@buzzard.me.uk>

On 29/10/14 20:25, Sven Oehme wrote:
> Hi,
>
> based on what i see is your BIOS or FW update wiped the NSD descriptor
> by restoring a GPT table on the start of a disk that shouldn't have a
> GPT table to begin with as its under control of GPFS.
> future releases of GPFS prevent this by writing our own GPT label to the
> disks so other tools don't touch them, but that doesn't help in your
> case any more. if you want this officially confirmed i would still open
> a PMR, but at that point given that you don't seem to have any
> production data on it from what i see in your response you should
> recreate the filesystem.
>

However before recreating the file system I would run the script to see 
if your disks have the secondary copy of the GPT partition table and if 
they do make sure it is wiped/removed *BEFORE* you go any further. 
Otherwise it could happen again...

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From Jared.Baker at uwyo.edu  Wed Oct 29 20:47:51 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 20:47:51 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <54514ED9.9030604@buzzard.me.uk>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
	<54514ED9.9030604@buzzard.me.uk>
Message-ID: <e97cc5ab29a0476f95fe1db739f8eea7@CY1PR0501MB1259.namprd05.prod.outlook.com>

Jonathan, which script are you talking about?

Thanks, Jared

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jonathan Buzzard
Sent: Wednesday, October 29, 2014 2:32 PM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

On 29/10/14 20:25, Sven Oehme wrote:
> Hi,
>
> based on what i see is your BIOS or FW update wiped the NSD descriptor
> by restoring a GPT table on the start of a disk that shouldn't have a
> GPT table to begin with as its under control of GPFS.
> future releases of GPFS prevent this by writing our own GPT label to the
> disks so other tools don't touch them, but that doesn't help in your
> case any more. if you want this officially confirmed i would still open
> a PMR, but at that point given that you don't seem to have any
> production data on it from what i see in your response you should
> recreate the filesystem.
>

However before recreating the file system I would run the script to see 
if your disks have the secondary copy of the GPT partition table and if 
they do make sure it is wiped/removed *BEFORE* you go any further. 
Otherwise it could happen again...

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From jonathan at buzzard.me.uk  Wed Oct 29 21:01:06 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 21:01:06 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <e97cc5ab29a0476f95fe1db739f8eea7@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>	<54514ED9.9030604@buzzard.me.uk>
	<e97cc5ab29a0476f95fe1db739f8eea7@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <54515592.4050606@buzzard.me.uk>

On 29/10/14 20:47, Jared David Baker wrote:
> Jonathan, which script are you talking about?
>

The one here

https://www.ibm.com/developerworks/community/forums/html/topic?id=32296bac-bfa1-45ff-9a43-08b0a36b17ef&ps=25

Use for detecting and clearing that secondary GPT table. Never used it 
of course, my disaster was caused by an idiot admin installing a new OS 
not mapping the disks out and then hit yes yes yes when asked if he 
wanted to blank the disks, the RHEL installer duly obliged. Then five 
days later I rebooted the last NSD server for an upgrade and BOOM 50TB 
and 80 million files down the swanny.


JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From mark.bergman at uphs.upenn.edu  Fri Oct 31 17:10:55 2014
From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu)
Date: Fri, 31 Oct 2014 13:10:55 -0400
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
Message-ID: <25152-1414775455.156309@Pc2q.WYui.XCNm>

Many GPFS logs & utilities refer to nodes via their <cXnY> name.

I haven't found an "mm*" executable that shows the mapping between that
name an the hostname.

Is there a simple method to map the <cXnY> designation to the node's
hostname?

Thanks,

Mark


From bevans at pixitmedia.com  Fri Oct 31 17:32:45 2014
From: bevans at pixitmedia.com (Barry Evans)
Date: Fri, 31 Oct 2014 17:32:45 +0000
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm>
References: <25152-1414775455.156309@Pc2q.WYui.XCNm>
Message-ID: <5453C7BD.8030608@pixitmedia.com>

I'm sure there is a better way to do this, but old habits die hard. I 
tend to use 'mmfsadm saferdump tscomm' - connection details should be 
littered throughout.

Cheers,
Barry
ArcaStream/Pixit Media


mark.bergman at uphs.upenn.edu wrote:
> Many GPFS logs&  utilities refer to nodes via their<cXnY>  name.
>
> I haven't found an "mm*" executable that shows the mapping between that
> name an the hostname.
>
> Is there a simple method to map the<cXnY>  designation to the node's
> hostname?
>
> Thanks,
>
> Mark
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 

This email is confidential in that it is intended for the exclusive 
attention of the addressee(s) indicated. If you are not the intended 
recipient, this email should not be read or disclosed to any other person. 
Please notify the sender immediately and delete this email from your 
computer system. Any opinions expressed are not necessarily those of the 
company from which this email was sent and, whilst to the best of our 
knowledge no viruses or defects exist, no responsibility can be accepted 
for any loss or damage arising from its receipt or subsequent use of this 
email.


From oehmes at us.ibm.com  Fri Oct 31 18:20:40 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Fri, 31 Oct 2014 11:20:40 -0700
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm>
References: <25152-1414775455.156309@Pc2q.WYui.XCNm>
Message-ID: <OFC9042F14.E53F6CAF-ON88257D82.0064B734-88257D82.0064C4CB@us.ibm.com>

Hi,

the official way to do this is mmdiag --network 

thx. Sven


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   mark.bergman at uphs.upenn.edu
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/31/2014 10:11 AM
Subject:        [gpfsug-discuss] mapping <cXnY> to hostname?
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Many GPFS logs & utilities refer to nodes via their <cXnY> name.

I haven't found an "mm*" executable that shows the mapping between that
name an the hostname.

Is there a simple method to map the <cXnY> designation to the node's
hostname?

Thanks,

Mark

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141031/35713187/attachment.htm>

From mark.bergman at uphs.upenn.edu  Fri Oct 31 18:57:44 2014
From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu)
Date: Fri, 31 Oct 2014 14:57:44 -0400
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
In-Reply-To: Your message of "Fri, 31 Oct 2014 11:20:40 -0700."
	<OFC9042F14.E53F6CAF-ON88257D82.0064B734-88257D82.0064C4CB@us.ibm.com>
References: <OFC9042F14.E53F6CAF-ON88257D82.0064B734-88257D82.0064C4CB@us.ibm.com>
	<25152-1414775455.156309@Pc2q.WYui.XCNm>
Message-ID: <9586-1414781864.388104@tEdB.dMla.tGDi>

In the message dated: Fri, 31 Oct 2014 11:20:40 -0700,
The pithy ruminations from Sven Oehme on 
<Re: [gpfsug-discuss] mapping <cXnY> to hostname?> were:
=> Hi,
=> 
=> the official way to do this is mmdiag --network 

OK.

I'm now using:

	mmdiag --network | awk '{if ( $1 ~ /<c[0-9]*n/ ) { printf $1 " " ; system("getent hosts "$2) }}'


Thanks,

Mark

=> 
=> thx. Sven
=> 
=> 
=> ------------------------------------------
=> Sven Oehme 
=> Scalable Storage Research 
=> email: oehmes at us.ibm.com 
=> Phone: +1 (408) 824-8904 
=> IBM Almaden Research Lab 
=> ------------------------------------------
=> 
=> 
=> 
=> From:   mark.bergman at uphs.upenn.edu
=> To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
=> Date:   10/31/2014 10:11 AM
=> Subject:        [gpfsug-discuss] mapping <cXnY> to hostname?
=> Sent by:        gpfsug-discuss-bounces at gpfsug.org
=> 
=> 
=> 
=> Many GPFS logs & utilities refer to nodes via their <cXnY> name.
=> 
=> I haven't found an "mm*" executable that shows the mapping between that
=> name an the hostname.
=> 
=> Is there a simple method to map the <cXnY> designation to the node's
=> hostname?
=> 
=> Thanks,
=> 
=> Mark
=> 


From stuartb at 4gh.net  Fri Oct  3 18:19:08 2014
From: stuartb at 4gh.net (Stuart Barkley)
Date: Fri, 3 Oct 2014 13:19:08 -0400 (EDT)
Subject: [gpfsug-discuss]  filesets and mountpoint naming
Message-ID: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>

Resent: First copy sent Sept 23.  Maybe stuck in a moderation queue?

When we first started using GPFS we created several filesystems and
just directly mounted them where seemed appropriate.  We have
something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome
and are looking at using filesets inside fewer filesystems to
accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user
experience, but I'm looking for suggestions as to where to mount the
actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one
cluster has a traditional gpfs system and a separate gss system which
will obviously need multiple mount points.  We also want to consider
possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you
do differently?

Thanks,
Stuart
-- 
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone


From bbanister at jumptrading.com  Mon Oct  6 16:17:44 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Mon, 6 Oct 2014 15:17:44 +0000
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>
References: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com>

There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang.

Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment.  Hope that helps,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
Sent: Friday, October 03, 2014 12:19 PM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] filesets and mountpoint naming

Resent: First copy sent Sept 23.  Maybe stuck in a moderation queue?

When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you do differently?

Thanks,
Stuart
--
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


From bbanister at jumptrading.com  Mon Oct  6 16:36:17 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Mon, 6 Oct 2014 15:36:17 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>

Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch -j FilesetName command (which specifies
a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in
parallel on a single fileset).

We were able to quickly create the "--home-inode-file" from the old file system using the mmapplypolicy command as the documentation describes.  However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.

Cheers,
-Bryan


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141006/24cbed89/attachment-0001.htm>

From Sandra.McLaughlin at astrazeneca.com  Mon Oct  6 16:40:45 2014
From: Sandra.McLaughlin at astrazeneca.com (McLaughlin, Sandra M)
Date: Mon, 6 Oct 2014 15:40:45 +0000
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
References: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
Message-ID: <5ed81d7bfbc94873aa804cfc807d5858@DBXPR04MB031.eurprd04.prod.outlook.com>

Hi Stuart,

We have a very similar setup. I use /gpfs01, /gpfs02 etc. and then use filesets within those, and symbolic links on the gpfs cluster members to give the same user experience combined with automounter maps (we have a large number of NFS clients as well as cluster members).  This all works quite well.

Regards, Sandra


--------------------------------------------------------------------------
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number: 03674842 and a registered office at 2 Kingdom Street, London, W2 6BD.
Confidentiality Notice: This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful.
Disclaimer: Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by AstraZeneca UK Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email.
Monitoring: AstraZeneca UK Limited may monitor email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking Compliance with our Code of Conduct and Policies.
-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
Sent: 23 September 2014 16:47
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] filesets and mountpoint naming

When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you do differently?

Thanks,
Stuart
--
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From zgiles at gmail.com  Mon Oct  6 16:42:56 2014
From: zgiles at gmail.com (Zachary Giles)
Date: Mon, 6 Oct 2014 11:42:56 -0400
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CAMYZk=cdXgXpFE7pkYxk8VRa7_Ani0hsrz68Y_HjE3GR+4xsyQ@mail.gmail.com>

Here we have just one large GPFS file system with many file sets
inside. We mount it under /sc/something (sc for scientific computing).
We user the /sc/ as we previously had another GPFS file system while
migrating from one to the other. It's pretty easy and straight forward
to have just one file system.. eases administration and mounting.
You can make symlinks.. like /scratch -> /sc/something/scratch/ if you
want. We did that, and it's how most of our users got to the system
for a long time. We even remounted the GPFS file system from where DDN
left it at install time ( /gs01 ) to /sc/gs01, updated the symlink,
and the users never knew.

Multicluster for compute nodes separate from the FS cluster.

YMMV depending on if you want to allow everyone to mount your file
system or not. I know some people don't. We only admin our own boxes
and no one else does, so it works best this way for us given the ideal
scenario.


On Mon, Oct 6, 2014 at 11:17 AM, Bryan Banister
<bbanister at jumptrading.com> wrote:
> There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang.
>
> Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment.  Hope that helps,
> -Bryan
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
> Sent: Friday, October 03, 2014 12:19 PM
> To: gpfsug-discuss at gpfsug.org
> Subject: [gpfsug-discuss] filesets and mountpoint naming
>
> Resent: First copy sent Sept 23.  Maybe stuck in a moderation queue?
>
> When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:
>
>     /home
>     /scratch
>     /projects
>     /reference
>     /applications
>
> We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).
>
> We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.
>
> We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.
>
> Some thoughts are to just do filesystems as:
>
>     /gpfs01, /gpfs02, etc.
>     /mnt/gpfs01, etc
>     /mnt/clustera/gpfs01, etc.
>
> What have other people done?  Are you happy with it?  What would you do differently?
>
> Thanks,
> Stuart
> --
> I've never been lost; I was once bewildered for three days, but never lost!
>                                         --  Daniel Boone _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Zach Giles
zgiles at gmail.com


From oehmes at gmail.com  Mon Oct  6 17:27:58 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Mon, 6 Oct 2014 09:27:58 -0700
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:

>  Just an FYI to the GPFS user community,
>
>
>
> We have been testing out GPFS AFM file systems in our required process of
> file data migration between two GPFS file systems.  The two GPFS file
> systems are managed in two separate GPFS clusters.  We have a third GPFS
> cluster for compute systems.  We created new independent AFM filesets in
> the new GPFS file system that are linked to directories in the old file
> system.  Unfortunately access to the AFM filesets from the compute cluster
> completely hang.  Access to the other parts of the second file system is
> fine.  This limitation/issue is not documented in the Advanced Admin Guide.
>
>
>
> Further, we performed prefetch operations using a file mmafmctl command,
> but the process appears to be single threaded and the operation was
> extremely slow as a result.  According to the Advanced Admin Guide, it is
> not possible to run multiple prefetch jobs on the same fileset:
>
> GPFS can prefetch the data using the *mmafmctl **Device **prefetch ?j **FilesetName
> *command (which specifies
>
> a list of files to prefetch). Note the following about prefetching:
>
> v It can be run in parallel on multiple filesets (although more than one
> prefetching job cannot be run in
>
> parallel on a single fileset).
>
>
>
> We were able to quickly create the ?--home-inode-file? from the old file
> system using the mmapplypolicy command as the documentation describes.
> However the AFM prefetch operation is so slow that we are better off
> running parallel rsync operations between the file systems versus using the
> GPFS AFM prefetch operation.
>
>
>
> Cheers,
>
> -Bryan
>
>
>
> ------------------------------
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141006/37d054b7/attachment-0001.htm>

From bbanister at jumptrading.com  Mon Oct  6 17:30:02 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Mon, 6 Oct 2014 16:30:02 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>

We are using 4.1.0.3 on the cluster with the AFM filesets,
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies
a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in
parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes.  However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.

Cheers,
-Bryan


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141006/220cf9e0/attachment-0001.htm>

From kgunda at in.ibm.com  Tue Oct  7 06:03:07 2014
From: kgunda at in.ibm.com (Kalyan Gunda)
Date: Tue, 7 Oct 2014 10:33:07 +0530
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>

Hi Bryan,
 AFM supports GPFS multi-cluster..and we have customers already using this
successfully.  Are you using GPFS backend?
Can you explain your configuration in detail and if ls is hung it would
have generated some long waiters.  Maybe this should be pursued separately
via PMR.  You can ping me the details directly if needed along with opening
a PMR per IBM service process.

As for as prefetch is concerned, right now its limited to  one prefetch job
per fileset.  Each job in itself is multi-threaded and can use multi-nodes
to pull in data based on configuration.
"afmNumFlushThreads" tunable controls the number of threads used by AFM.
This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't
show this param for some reason, I will have that updated.)

eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5
Fileset prefetchIW changed.

List the change:
 mmlsfileset fs1 prefetchIW --afm -L
Filesets in file system 'fs1':

Attributes for fileset prefetchIW:
===================================
Status                                  Linked
Path                                    /gpfs/fs1/prefetchIW
Id                                      36
afm-associated                          Yes
Target
nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch
Mode                                    independent-writer
File Lookup Refresh Interval            30 (default)
File Open Refresh Interval              30 (default)
Dir Lookup Refresh Interval             60 (default)
Dir Open Refresh Interval               60 (default)
Async Delay                             15 (default)
Last pSnapId                            0
Display Home Snapshots                  no
Number of Gateway Flush Threads         5
Prefetch Threshold                      0 (default)
Eviction Enabled                        yes (default)

AFM parallel i/o can be setup such that multiple GW nodes can be used to
pull in data..more details are available here
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm

and this link outlines tuning params for parallel i/o along with others:
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning

Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:	Bryan Banister <bbanister at jumptrading.com>
To:	gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:	10/06/2014 09:57 PM
Subject:	Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:	gpfsug-discuss-bounces at gpfsug.org


We are using 4.1.0.3 on the cluster with the AFM filesets,
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster
environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of
file data migration between two GPFS file systems.  The two GPFS file
systems are managed in two separate GPFS clusters.  We have a third GPFS
cluster for compute systems.  We created new independent AFM filesets in
the new GPFS file system that are linked to directories in the old file
system.  Unfortunately access to the AFM filesets from the compute cluster
completely hang.  Access to the other parts of the second file system is
fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command,
but the process appears to be single threaded and the operation was
extremely slow as a result.  According to the Advanced Admin Guide, it is
not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j
FilesetName command (which specifies
a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one
prefetching job cannot be run in
parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file
system using the mmapplypolicy command as the documentation describes.
However the AFM prefetch operation is so slow that we are better off
running parallel rsync operations between the file systems versus using the
GPFS AFM prefetch operation.

Cheers,
-Bryan


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From bbanister at jumptrading.com  Tue Oct  7 15:44:48 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Tue, 7 Oct 2014 14:44:48 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>

Interesting that AFM is supposed to work in a multi-cluster environment.  We were using GPFS on the backend.  The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount.   The "gateway" nodes in the new cluster mounted the old file system.  All systems were connected over the same QDR IB fabric.  The client compute nodes in the third cluster mounted both the old and new file systems.  I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed.  I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable.  There were no error messages in any GPFS or system logs related to the `ls` fail.  NFS clients accessing cNFS servers in the new cluster also worked as expected.  The `ls` from the NFS client in an AFM fileset returned the expected directory listing.  Thus all symptoms indicated the configuration wasn't supported.  I may try to replicate the problem in a test environment at some point.

However AFM isn't really a great solution for file data migration between file systems for these reasons:
1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data).  No way to have it simply keep the two namespaces in sync.  And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system.  This is concerning and it is not easy to detect that this event has occurred.

2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME!  And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES!

3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration.  Multiple threads from a single node just doesn't cut it due to single node bandwidth limits.  When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second.  The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list!

4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration.  Yes, there is of course efforts to breakup the namespace for each rsync operations.  This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired.

AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN.

Thanks,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda
Sent: Tuesday, October 07, 2014 12:03 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations

Hi Bryan,
 AFM supports GPFS multi-cluster..and we have customers already using this successfully.  Are you using GPFS backend?
Can you explain your configuration in detail and if ls is hung it would have generated some long waiters.  Maybe this should be pursued separately via PMR.  You can ping me the details directly if needed along with opening a PMR per IBM service process.

As for as prefetch is concerned, right now its limited to  one prefetch job per fileset.  Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration.
"afmNumFlushThreads" tunable controls the number of threads used by AFM.
This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.)

eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed.

List the change:
 mmlsfileset fs1 prefetchIW --afm -L
Filesets in file system 'fs1':

Attributes for fileset prefetchIW:
===================================
Status                                  Linked
Path                                    /gpfs/fs1/prefetchIW
Id                                      36
afm-associated                          Yes
Target
nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch
Mode                                    independent-writer
File Lookup Refresh Interval            30 (default)
File Open Refresh Interval              30 (default)
Dir Lookup Refresh Interval             60 (default)
Dir Open Refresh Interval               60 (default)
Async Delay                             15 (default)
Last pSnapId                            0
Display Home Snapshots                  no
Number of Gateway Flush Threads         5
Prefetch Threshold                      0 (default)
Eviction Enabled                        yes (default)

AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm

and this link outlines tuning params for parallel i/o along with others:
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning

Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:   Bryan Banister <bbanister at jumptrading.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/06/2014 09:57 PM
Subject:        Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:        gpfsug-discuss-bounces at gpfsug.org


We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan

From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes.
However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.

Cheers,
-Bryan


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

From kgunda at in.ibm.com  Tue Oct  7 16:20:30 2014
From: kgunda at in.ibm.com (Kalyan Gunda)
Date: Tue, 7 Oct 2014 20:50:30 +0530
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <OF63B850E2.11E81404-ON65257D6A.005322A1-65257D6A.00544690@in.ibm.com>

some clarifications inline:

Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:	Bryan Banister <bbanister at jumptrading.com>
To:	gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:	10/07/2014 08:12 PM
Subject:	Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:	gpfsug-discuss-bounces at gpfsug.org


Interesting that AFM is supposed to work in a multi-cluster environment.
We were using GPFS on the backend.  The new GPFS file system was AFM linked
over GPFS protocol to the old GPFS file system using the standard
multi-cluster mount.   The "gateway" nodes in the new cluster mounted the
old file system.  All systems were connected over the same QDR IB fabric.
The client compute nodes in the third cluster mounted both the old and new
file systems.  I looked for waiters on the client and NSD servers of the
new file system when the problem occurred, but none existed.  I tried
stracing the `ls` process, but it reported nothing and the strace itself
become unkillable.  There were no error messages in any GPFS or system logs
related to the `ls` fail.  NFS clients accessing cNFS servers in the new
cluster also worked as expected.  The `ls` from the NFS client in an AFM
fileset returned the expected directory listing.  Thus all symptoms
indicated the configuration wasn't supported.  I may try to replicate the
problem in a test environment at some point.

However AFM isn't really a great solution for file data migration between
file systems for these reasons:
1) It requires the complicated AFM setup, which requires manual operations
to sync data between the file systems (e.g. mmapplypolicy run on old file
system to get file list THEN mmafmctl prefetch operation on the new AFM
fileset to pull data).  No way to have it simply keep the two namespaces in
sync.  And you must be careful with the "Local Update" configuration not to
modify basically ANY file attributes in the new AFM fileset until a CLEAN
cutover of your application is performed, otherwise AFM will remove the
link of the file to data stored on the old file system.  This is concerning
and it is not easy to detect that this event has occurred.

--> The LU mode is meant for scenarios where changes in cache are not meant
to be pushed back to old filesystem.  If thats not whats desired then other
AFM modes like IW can be used to keep namespace in sync and data can flow
from both sides.  Typically, for data migration --metadata-only to pull in
the full namespace first and data can be migrated on demand or via policy
as outlined above using prefetch cmd.  AFM setup should be extension to
GPFS multi-cluster setup when using GPFS backend.

2) The "Progressive migration with no downtime" directions actually states
that there is downtime required to move applications to the new cluster,
THUS DOWNTIME!  And it really requires a SECOND downtime to finally disable
AFM on the file set so that there is no longer a connection to the old file
system, THUS TWO DOWNTIMES!
--> I am not sure I follow the first downtime.  If applications have to
start using the new filesystem, then they have to be informed accordingly.
If this can be done without bringing down applications, then there is no
DOWNTIME.
Regarding, second downtime, you are right, disabling AFM after data
migration requires unlink and hence downtime.  But there is a easy
workaround, where revalidation intervals can be increased to max or GW
nodes can be unconfigured without downtime with same effect.  And disabling
AFM can be done at a later point during maintenance window.  We plan to
modify this to have this done online aka without requiring unlink of the
fileset.  This will get prioritized if there is enough interest in AFM
being used in this direction.

3) The prefetch operation can only run on a single node thus is not able to
take any advantage of the large number of NSD servers supporting both file
systems for the data migration.  Multiple threads from a single node just
doesn't cut it due to single node bandwidth limits.  When I was running the
prefetch it was only executing roughly 100 " Queue numExec" operations per
second.  The prefetch operation for a directory with 12 Million files was
going to take over 33 HOURS just to process the file list!
--> Prefetch can run on multiple nodes by configuring multiple GW nodes and
enabling parallel i/o as specified in the docs..link provided below.
Infact it can parallelize data xfer to a single file and also do multiple
files in parallel depending on filesizes and various tuning params.

4) In comparison, parallel rsync operations will require only ONE downtime
to run a final sync over MULTIPLE nodes in parallel at the time that
applications are migrated between file systems and does not require the
complicated AFM configuration.  Yes, there is of course efforts to breakup
the namespace for each rsync operations.  This is really what AFM should be
doing for us... chopping up the namespace intelligently and spawning
prefetch operations across multiple nodes in a configurable way to ensure
performance is met or limiting overall impact of the operation if desired.

--> AFM can be used for data migration without any downtime dictated by AFM
(see above) and it can infact use multiple threads on multiple nodes to do
parallel i/o.

AFM, however, is great for what it is intended to be, a cached data access
mechanism across a WAN.

Thanks,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda
Sent: Tuesday, October 07, 2014 12:03 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster
environment, slow prefetch operations

Hi Bryan,
 AFM supports GPFS multi-cluster..and we have customers already using this
successfully.  Are you using GPFS backend?
Can you explain your configuration in detail and if ls is hung it would
have generated some long waiters.  Maybe this should be pursued separately
via PMR.  You can ping me the details directly if needed along with opening
a PMR per IBM service process.

As for as prefetch is concerned, right now its limited to  one prefetch job
per fileset.  Each job in itself is multi-threaded and can use multi-nodes
to pull in data based on configuration.
"afmNumFlushThreads" tunable controls the number of threads used by AFM.
This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't
show this param for some reason, I will have that updated.)

eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW
changed.

List the change:
 mmlsfileset fs1 prefetchIW --afm -L
Filesets in file system 'fs1':

Attributes for fileset prefetchIW:
===================================
Status                                  Linked
Path                                    /gpfs/fs1/prefetchIW
Id                                      36
afm-associated                          Yes
Target
nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch
Mode                                    independent-writer
File Lookup Refresh Interval            30 (default)
File Open Refresh Interval              30 (default)
Dir Lookup Refresh Interval             60 (default)
Dir Open Refresh Interval               60 (default)
Async Delay                             15 (default)
Last pSnapId                            0
Display Home Snapshots                  no
Number of Gateway Flush Threads         5
Prefetch Threshold                      0 (default)
Eviction Enabled                        yes (default)

AFM parallel i/o can be setup such that multiple GW nodes can be used to
pull in data..more details are available here
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm


and this link outlines tuning params for parallel i/o along with others:
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning


Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:   Bryan Banister <bbanister at jumptrading.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/06/2014 09:57 PM
Subject:        Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:        gpfsug-discuss-bounces at gpfsug.org


We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan

From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster
environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of
file data migration between two GPFS file systems.  The two GPFS file
systems are managed in two separate GPFS clusters.  We have a third GPFS
cluster for compute systems.  We created new independent AFM filesets in
the new GPFS file system that are linked to directories in the old file
system.  Unfortunately access to the AFM filesets from the compute cluster
completely hang.  Access to the other parts of the second file system is
fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command,
but the process appears to be single threaded and the operation was
extremely slow as a result.  According to the Advanced Admin Guide, it is
not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j
FilesetName command (which specifies a list of files to prefetch). Note the
following about prefetching:
v It can be run in parallel on multiple filesets (although more than one
prefetching job cannot be run in parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file
system using the mmapplypolicy command as the documentation describes.
However the AFM prefetch operation is so slow that we are better off
running parallel rsync operations between the file systems versus using the
GPFS AFM prefetch operation.

Cheers,
-Bryan


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From sdinardo at ebi.ac.uk  Thu Oct  9 13:02:44 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 09 Oct 2014 13:02:44 +0100
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
Message-ID: <54367964.1050900@ebi.ac.uk>

Hello everyone,

Suppose we want to build a new GPFS storage using SAN attached storages, 
but instead to put metadata in a shared storage, we want to use  
FusionIO PCI cards locally on the servers to speed up metadata 
operation( http://www.fusionio.com/products/iodrive) and for 
reliability, replicate the metadata in all the servers, will this work 
in case of  server failure?

To make it more clear: If a server fail i will loose also a metadata 
vdisk. Its the replica mechanism its reliable enough to avoid metadata 
corruption and loss of data?

Thanks in advance
Salvatore Di Nardo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141009/010abbe4/attachment-0001.htm>

From bbanister at jumptrading.com  Thu Oct  9 20:31:28 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 9 Oct 2014 19:31:28 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>

Just wanted to pass my GPFS RFE along:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458


Description:

GPFS File System Manager should provide the option to log all file and directory operations that occur in a file system, preferably stored in a TSD (Time Series Database) that could be quickly queried through an API interface and command line tools.    This would allow many required file system management operations to obtain the change log of a file system namespace without having to use the GPFS ILM policy engine to search all file system metadata for changes, and would not need to run massive differential comparisons of file system namespace snapshots to determine what files have been modified, deleted, added, etc.

It would be doubly great if this could be controlled on a per-fileset bases.


Use case:

This could be used for a very large number of file system management applications, including:
1) SOBAR (Scale-Out Backup And Restore)
2) Data Security Auditing and Monitoring applications
3) Async Replication of namespace between GPFS file systems without the requirement of AFM, which must use ILM policies that add unnecessary workload to metadata resources.
4) Application file system access profiling

Please vote for it if you feel it would also benefit your operation, thanks,
-Bryan


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141009/430cce16/attachment-0001.htm>

From service at metamodul.com  Fri Oct 10 13:21:43 2014
From: service at metamodul.com (service at metamodul.com)
Date: Fri, 10 Oct 2014 14:21:43 +0200 (CEST)
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <937639307.291563.1412943703119.JavaMail.open-xchange@oxbaltgw12.schlund.de>

 
> Bryan Banister <bbanister at jumptrading.com> hat am 9. Oktober 2014 um 21:31
> geschrieben:
> 
> 
>  Just wanted to pass my GPFS RFE along:
> 
>  http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
> <http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458>
> 

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."

Cheers
Hajo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/5ad0dff4/attachment-0001.htm>

From pgp at psu.edu  Fri Oct 10 16:04:02 2014
From: pgp at psu.edu (Phil Pishioneri)
Date: Fri, 10 Oct 2014 11:04:02 -0400
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <5437F562.1080609@psu.edu>

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and 
> directory operations that occur in a file system, preferably stored in 
> a TSD (Time Series Database) that could be quickly queried through an 
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS 
(used by the TSM HSM product). A while ago this was posted to the IBM 
GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil


From bbanister at jumptrading.com  Fri Oct 10 16:08:04 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 10 Oct 2014 15:08:04 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <5437F562.1080609@psu.edu>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>

Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


From bdeluca at gmail.com  Fri Oct 10 16:26:40 2014
From: bdeluca at gmail.com (Ben De Luca)
Date: Fri, 10 Oct 2014 23:26:40 +0800
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>

Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com>
wrote:

> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
> option.  Has anybody done this already and are there any examples we could
> look at?
>
> Thanks!
> -Bryan
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
> Sent: Friday, October 10, 2014 10:04 AM
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>
> On 10/9/14 3:31 PM, Bryan Banister wrote:
> >
> > Just wanted to pass my GPFS RFE along:
> >
> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> > 0458
> >
> >
> > *Description*:
> >
> > GPFS File System Manager should provide the option to log all file and
> > directory operations that occur in a file system, preferably stored in
> > a TSD (Time Series Database) that could be quickly queried through an
> > API interface and command line tools.  ...
> >
>
> The rudimentaries for this already exist via the DMAPI interface in GPFS
> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
> DeveloperWorks forum:
>
> On 1/3/11 10:27 AM, dWForums wrote:
> > Author:
> > AlokK.Dhir
> >
> > Message:
> > We have a proof of concept which uses DMAPI to listens to and passively
> logs filesystem changes with a non blocking listener.  This log can be used
> to generate backup sets etc.  Unfortunately, a bug in the current DMAPI
> keeps this approach from working in the case of certain events.  I am told
> 3.4.0.3 may contain a fix.  We will gladly share the code once it is
> working.
>
> -Phil
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/d32dbb50/attachment-0001.htm>

From oehmes at gmail.com  Fri Oct 10 16:51:51 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Fri, 10 Oct 2014 08:51:51 -0700
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
Message-ID: <CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>

Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is
here :
https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com> wrote:

> Id like this to see hot files
>
> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <
> bbanister at jumptrading.com> wrote:
>
>> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
>> option.  Has anybody done this already and are there any examples we could
>> look at?
>>
>> Thanks!
>> -Bryan
>>
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
>> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
>> Sent: Friday, October 10, 2014 10:04 AM
>> To: gpfsug main discussion list
>> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>>
>> On 10/9/14 3:31 PM, Bryan Banister wrote:
>> >
>> > Just wanted to pass my GPFS RFE along:
>> >
>> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
>> > 0458
>> >
>> >
>> > *Description*:
>> >
>> > GPFS File System Manager should provide the option to log all file and
>> > directory operations that occur in a file system, preferably stored in
>> > a TSD (Time Series Database) that could be quickly queried through an
>> > API interface and command line tools.  ...
>> >
>>
>> The rudimentaries for this already exist via the DMAPI interface in GPFS
>> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
>> DeveloperWorks forum:
>>
>> On 1/3/11 10:27 AM, dWForums wrote:
>> > Author:
>> > AlokK.Dhir
>> >
>> > Message:
>> > We have a proof of concept which uses DMAPI to listens to and passively
>> logs filesystem changes with a non blocking listener.  This log can be used
>> to generate backup sets etc.  Unfortunately, a bug in the current DMAPI
>> keeps this approach from working in the case of certain events.  I am told
>> 3.4.0.3 may contain a fix.  We will gladly share the code once it is
>> working.
>>
>> -Phil
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> ________________________________
>>
>> Note: This email is for the confidential use of the named addressee(s)
>> only and may contain proprietary, confidential or privileged information.
>> If you are not the intended recipient, you are hereby notified that any
>> review, dissemination or copying of this email is strictly prohibited, and
>> to please notify the sender immediately and destroy this email and any
>> attachments. Email transmission cannot be guaranteed to be secure or
>> error-free. The Company, therefore, does not make any guarantees as to the
>> completeness or accuracy of this email or any attachments. This email is
>> for informational purposes only and does not constitute a recommendation,
>> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
>> or perform any type of transaction of a financial product.
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/4ca468f9/attachment-0001.htm>

From Paul.Sanchez at deshaw.com  Fri Oct 10 17:02:09 2014
From: Paul.Sanchez at deshaw.com (Sanchez, Paul)
Date: Fri, 10 Oct 2014 16:02:09 +0000
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
In-Reply-To: <54367964.1050900@ebi.ac.uk>
References: <54367964.1050900@ebi.ac.uk>
Message-ID: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>

Hi Salvatore,

We've done this before (non-shared metadata NSDs with GPFS 4.1) and noted these constraints:

* Filesystem descriptor quorum: since it will be easier to have a metadata disk go offline, it's even more important to have three failure groups with FusionIO metadata NSDs in two, and at least a desc_only NSD in the third one. You may even want to explore having three full metadata replicas on FusionIO. (Or perhaps if your workload can tolerate it the third one can be slower but in another GPFS "subnet" so that it isn't used for reads.)

* Make sure to set the correct default metadata replicas in your filesystem, corresponding to the number of metadata failure groups you set up. When a metadata server goes offline, it will take the metadata disks with it, and you want a replica of the metadata to be available.

* When a metadata server goes offline and comes back up (after a maintenance reboot, for example), the non-shared metadata disks will be stopped. Until those are brought back into a  well-known replicated state, you are at risk of a cluster-wide filesystem unmount if there is a subsequent metadata disk failure. But GPFS will continue to work, by default, allowing reads and writes against the remaining metadata replica. You must detect that disks are stopped (e.g. mmlsdisk) and restart them (e.g. with mmchdisk <fs> start ?a).

I haven't seen anyone "recommend" running non-shared disk like this, and I wouldn't do this for things which can't afford to go offline unexpectedly and require a little more operational attention. But it does appear to work.

Thx
Paul Sanchez


From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Salvatore Di Nardo
Sent: Thursday, October 09, 2014 8:03 AM
To: gpfsug main discussion list
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?

Hello everyone,

Suppose we want to build a new GPFS storage using SAN attached storages, but instead to put metadata in a shared storage, we want to use  FusionIO PCI cards locally on the servers to speed up metadata operation( http://www.fusionio.com/products/iodrive) and for reliability, replicate the metadata in all the servers, will this work in case of  server failure?

To make it more clear: If a server fail i will loose also a metadata vdisk. Its the replica mechanism its reliable enough to avoid metadata corruption and loss of data?

Thanks in advance
Salvatore Di Nardo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/f4da20dc/attachment-0001.htm>

From oester at gmail.com  Fri Oct 10 17:05:03 2014
From: oester at gmail.com (Bob Oesterlin)
Date: Fri, 10 Oct 2014 11:05:03 -0500
Subject: [gpfsug-discuss] GPFS File Heat
Message-ID: <CAMNdFvD5kqP7pzR3gL7Os3wo5Q9maRHCrYRSetEPYAggzGTXzA@mail.gmail.com>

As Sven suggests, this is easy to gather once you turn on file heat. I run
this heat.pol file against a file systems to gather the values:

-- heat.pol --

define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1)
END])

rule fh1 external list 'fh' exec ''
rule fh2 list 'fh' weight(FILE_HEAT) show( DISPLAY_NULL(FILE_HEAT) || '|'
|| varchar(file_size) )

-- heat.pol --

Produces output similar to this:

/gpfs/.../specFile.pyc 535089836 5892
/gpfs/.../syspath.py 528685287 806
/gpfs/---/bwe.py 528160670 4607

Actual GPFS file path redacted :)

After that it's a relatively straightforward process to go thru the values.
There is no documentation on what the values really mean, but it does give
you some overall indication of which files are getting the most hits.

I have other information to share; drop me a note at my work email:

robert.oesterlin at nuance.com

Bob Oesterlin
Sr Storage Engineer, Nuance Communications
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/6cea1102/attachment-0001.htm>

From bdeluca at gmail.com  Fri Oct 10 17:09:49 2014
From: bdeluca at gmail.com (Ben De Luca)
Date: Sat, 11 Oct 2014 00:09:49 +0800
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
Message-ID: <CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>

querying this through the policy engine is far to late to do any thing
useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com> wrote:

> Ben,
>
> to get lists of 'Hot Files' turn File Heat on , some discussion about it
> is here :
> https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653
>
> thx.  Sven
>
>
> On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com> wrote:
>
>> Id like this to see hot files
>>
>> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <
>> bbanister at jumptrading.com> wrote:
>>
>>> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
>>> option.  Has anybody done this already and are there any examples we could
>>> look at?
>>>
>>> Thanks!
>>> -Bryan
>>>
>>> -----Original Message-----
>>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
>>> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
>>> Sent: Friday, October 10, 2014 10:04 AM
>>> To: gpfsug main discussion list
>>> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>>>
>>> On 10/9/14 3:31 PM, Bryan Banister wrote:
>>> >
>>> > Just wanted to pass my GPFS RFE along:
>>> >
>>> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
>>> > 0458
>>> >
>>> >
>>> > *Description*:
>>> >
>>> > GPFS File System Manager should provide the option to log all file and
>>> > directory operations that occur in a file system, preferably stored in
>>> > a TSD (Time Series Database) that could be quickly queried through an
>>> > API interface and command line tools.  ...
>>> >
>>>
>>> The rudimentaries for this already exist via the DMAPI interface in GPFS
>>> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
>>> DeveloperWorks forum:
>>>
>>> On 1/3/11 10:27 AM, dWForums wrote:
>>> > Author:
>>> > AlokK.Dhir
>>> >
>>> > Message:
>>> > We have a proof of concept which uses DMAPI to listens to and
>>> passively logs filesystem changes with a non blocking listener.  This log
>>> can be used to generate backup sets etc.  Unfortunately, a bug in the
>>> current DMAPI keeps this approach from working in the case of certain
>>> events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the
>>> code once it is working.
>>>
>>> -Phil
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>> ________________________________
>>>
>>> Note: This email is for the confidential use of the named addressee(s)
>>> only and may contain proprietary, confidential or privileged information.
>>> If you are not the intended recipient, you are hereby notified that any
>>> review, dissemination or copying of this email is strictly prohibited, and
>>> to please notify the sender immediately and destroy this email and any
>>> attachments. Email transmission cannot be guaranteed to be secure or
>>> error-free. The Company, therefore, does not make any guarantees as to the
>>> completeness or accuracy of this email or any attachments. This email is
>>> for informational purposes only and does not constitute a recommendation,
>>> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
>>> or perform any type of transaction of a financial product.
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141011/982198d6/attachment-0001.htm>

From bbanister at jumptrading.com  Fri Oct 10 17:15:22 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 10 Oct 2014 16:15:22 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>

I agree with Ben, I think.

I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/3e5ecf5a/attachment-0001.htm>

From Paul.Sanchez at deshaw.com  Fri Oct 10 17:24:32 2014
From: Paul.Sanchez at deshaw.com (Sanchez, Paul)
Date: Fri, 10 Oct 2014 16:24:32 +0000
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
References: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
Message-ID: <201D6001C896B846A9CFC2E841986AC1451878D2@mailnycmb2a.winmail.deshaw.com>

We've been mounting all filesystems in a canonical location and bind mounting filesets into the namespace.  

One gotcha that we recently encountered though was the selection of /gpfs as the root of the canonical mount path.  (By default automountdir is set to /gpfs/automountdir, which made this seem like a good spot.)  This seems to be where gpfs expects filesystems to be mounted, since there are some hardcoded references in the gpfs.base RPM %pre script (RHEL package for GPFS) which try to nudge processes off of the filesystems before yanking the mounts during an RPM version upgrade.  This however may take an exceedingly long time, since it's doing an 'lsof +D /gpfs' which walks the filesystems.

-Paul Sanchez

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
Sent: Tuesday, September 23, 2014 11:47 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] filesets and mountpoint naming

When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you do differently?

Thanks,
Stuart
--
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oehmes at gmail.com  Fri Oct 10 17:52:27 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Fri, 10 Oct 2014 09:52:27 -0700
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>

The only DMAPI agent i am aware of is a prototype that was written by
tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support
the DMAPI interface , but the code that exposes the API is nothing we
provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:

>  I agree with Ben, I think.
>
>
>
> I don?t want to use the ILM policy engine as that puts a direct workload
> against the metadata storage and server resources.  We need something
> out-of-band, out of the file system operational path.
>
>
>
> Is there a simple DMAPI daemon that would log the file system namespace
> changes that we could use?
>
>
>
> If so are there any limitations?
>
>
>
> And is it possible to set this up in an HA environment?
>
>
>
> Thanks!
>
> -Bryan
>
>
>
> *From:* gpfsug-discuss-bounces at gpfsug.org [mailto:
> gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Ben De Luca
> *Sent:* Friday, October 10, 2014 11:10 AM
>
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
>
>
> querying this through the policy engine is far to late to do any thing
> useful with it
>
>
>
> On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com> wrote:
>
> Ben,
>
>
>
> to get lists of 'Hot Files' turn File Heat on , some discussion about it
> is here :
> https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653
>
>
>
> thx.  Sven
>
>
>
>
>
> On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com> wrote:
>
> Id like this to see hot files
>
>
>
> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <
> bbanister at jumptrading.com> wrote:
>
> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
> option.  Has anybody done this already and are there any examples we could
> look at?
>
> Thanks!
> -Bryan
>
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
> Sent: Friday, October 10, 2014 10:04 AM
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>
> On 10/9/14 3:31 PM, Bryan Banister wrote:
> >
> > Just wanted to pass my GPFS RFE along:
> >
> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> > 0458
> >
> >
> > *Description*:
> >
> > GPFS File System Manager should provide the option to log all file and
> > directory operations that occur in a file system, preferably stored in
> > a TSD (Time Series Database) that could be quickly queried through an
> > API interface and command line tools.  ...
> >
>
> The rudimentaries for this already exist via the DMAPI interface in GPFS
> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
> DeveloperWorks forum:
>
> On 1/3/11 10:27 AM, dWForums wrote:
> > Author:
> > AlokK.Dhir
> >
> > Message:
> > We have a proof of concept which uses DMAPI to listens to and passively
> logs filesystem changes with a non blocking listener.  This log can be used
> to generate backup sets etc.  Unfortunately, a bug in the current DMAPI
> keeps this approach from working in the case of certain events.  I am told
> 3.4.0.3 may contain a fix.  We will gladly share the code once it is
> working.
>
> -Phil
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> ------------------------------
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/0961d1f4/attachment-0001.htm>

From bbanister at jumptrading.com  Fri Oct 10 18:13:16 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 10 Oct 2014 17:13:16 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>

A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted.  I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable.  I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!)

Thanks,
-Bryan

Ps. Hajo said that he couldn?t access the RFE to vote on it:

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."
Cheers
Hajo

Here is what the RFE website states:
Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
A unique URL that you can bookmark and share with others.

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Friday, October 10, 2014 11:52 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
I agree with Ben, I think.

I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM

To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/e60c8dfc/attachment-0001.htm>

From sdinardo at ebi.ac.uk  Sat Oct 11 10:37:10 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Sat, 11 Oct 2014 10:37:10 +0100
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
References: <54367964.1050900@ebi.ac.uk>
	<201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
Message-ID: <5438FA46.7090902@ebi.ac.uk>

Thanks for your answer.
Yes, the idea is to have 3 servers in 3 different failure groups. Each 
of them with a  drive and set 3 metadata replica as the default one.

I have not considered that the vdisks could be off after a 'reboot' or 
failure, so that's a good point, but anyway , after a failure or even a 
standard reboot, the server and the cluster have to be checked anyway, 
and i always check the vdisk status, so no big deal.

Your answer made me consider also another thing...  Once put them back 
online, they will be restriped automatically or should i run every time  
'mmrestripefs' to verify/correct the replicas?

I understand that use lodal disk sound strange, infact our first idea 
was just to add some ssd to the shared storage, but then we considered 
that the sas cable could be a huge bottleneck. The cost difference is 
not huge and the fusioio locally on the server would make the metadata 
just fly.


On 10/10/14 17:02, Sanchez, Paul wrote:
>
> Hi Salvatore,
>
> We've done this before (non-shared metadata NSDs with GPFS 4.1) and 
> noted these constraints:
>
> * Filesystem descriptor quorum: since it will be easier to have a 
> metadata disk go offline, it's even more important to have three 
> failure groups with FusionIO metadata NSDs in two, and at least a 
> desc_only NSD in the third one. You may even want to explore having 
> three full metadata replicas on FusionIO. (Or perhaps if your workload 
> can tolerate it the third one can be slower but in another GPFS 
> "subnet" so that it isn't used for reads.)
>
> * Make sure to set the correct default metadata replicas in your 
> filesystem, corresponding to the number of metadata failure groups you 
> set up. When a metadata server goes offline, it will take the metadata 
> disks with it, and you want a replica of the metadata to be available.
>
> * When a metadata server goes offline and comes back up (after a 
> maintenance reboot, for example), the non-shared metadata disks will 
> be stopped. Until those are brought back into a  well-known replicated 
> state, you are at risk of a cluster-wide filesystem unmount if there 
> is a subsequent metadata disk failure. But GPFS will continue to work, 
> by default, allowing reads and writes against the remaining metadata 
> replica. You must detect that disks are stopped (e.g. mmlsdisk) and 
> restart them (e.g. with mmchdisk <fs> start ?a).
>
> I haven't seen anyone "recommend" running non-shared disk like this, 
> and I wouldn't do this for things which can't afford to go offline 
> unexpectedly and require a little more operational attention. But it 
> does appear to work.
>
> Thx
> Paul Sanchez
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di 
> Nardo
> *Sent:* Thursday, October 09, 2014 8:03 AM
> *To:* gpfsug main discussion list
> *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable?
>
> Hello everyone,
>
> Suppose we want to build a new GPFS storage using SAN attached 
> storages, but instead to put metadata in a shared storage, we want to 
> use  FusionIO PCI cards locally on the servers to speed up metadata 
> operation( http://www.fusionio.com/products/iodrive) and for 
> reliability, replicate the metadata in all the servers, will this work 
> in case of  server failure?
>
> To make it more clear: If a server fail i will loose also a metadata 
> vdisk. Its the replica mechanism its reliable enough to avoid metadata 
> corruption and loss of data?
>
> Thanks in advance
> Salvatore Di Nardo
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141011/f580507a/attachment-0001.htm>

From service at metamodul.com  Sun Oct 12 17:03:56 2014
From: service at metamodul.com (MetaService)
Date: Sun, 12 Oct 2014 18:03:56 +0200
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
References: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
Message-ID: <1413129836.4846.9.camel@titan>

My preferred naming convention is to use the cluster name or part of it
as the base directory for all GPFS mounts.

Example: Clustername=c1_eum would mean that:

/c1_eum/

would be the base directory for all Cluster c1_eum GPFSs

In case a second local cluster would exist its root mount point would
be /c2_eum/

Even in case of mounting remote clusters a naming collision is not very
likely.

BTW: For accessing the the final directories /.../scratch ... the user
should not rely on the mount points but on given variables provided.

CLS_HOME=/...
CLS_SCRATCH=/....

hth
Hajo


From lhorrocks-barlow at ocf.co.uk  Fri Oct 10 17:48:24 2014
From: lhorrocks-barlow at ocf.co.uk (Laurence Horrocks- Barlow)
Date: Fri, 10 Oct 2014 17:48:24 +0100
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
References: <54367964.1050900@ebi.ac.uk>
	<201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
Message-ID: <54380DD8.2020909@ocf.co.uk>

Hi Salvatore,

Just to add that when the local metadata disk fails or the server goes 
offline there will most likely be an I/O interruption/pause whist the 
GPFS cluster renegotiates.

The main concept to be aware of (as Paul mentioned) is that when a disk 
goes offline it will appear down to GPFS, once you've started the disk 
again it will rediscover and scan the metadata for any missing updates, 
these updates are then repaired/replicated again.

Laurence Horrocks-Barlow
Linux Systems Software Engineer
OCF plc

Tel: +44 (0)114 257 2200
Fax: +44 (0)114 257 0022
Web: www.ocf.co.uk <http://www.ocf.co.uk>
Blog: blog.ocf.co.uk <http://blog.ocf.co.uk>
Twitter: @ocfplc <http://twitter.com/#%21/ocfplc>

OCF plc is a company registered in England and Wales. Registered number 
4132533, VAT number GB 780 6803 14. Registered office address: OCF plc, 
5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 
2PG.

This message is private and confidential. If you have received this 
message in error, please notify us and remove it from your system.


On 10/10/2014 17:02, Sanchez, Paul wrote:
>
> Hi Salvatore,
>
> We've done this before (non-shared metadata NSDs with GPFS 4.1) and 
> noted these constraints:
>
> * Filesystem descriptor quorum: since it will be easier to have a 
> metadata disk go offline, it's even more important to have three 
> failure groups with FusionIO metadata NSDs in two, and at least a 
> desc_only NSD in the third one. You may even want to explore having 
> three full metadata replicas on FusionIO. (Or perhaps if your workload 
> can tolerate it the third one can be slower but in another GPFS 
> "subnet" so that it isn't used for reads.)
>
> * Make sure to set the correct default metadata replicas in your 
> filesystem, corresponding to the number of metadata failure groups you 
> set up. When a metadata server goes offline, it will take the metadata 
> disks with it, and you want a replica of the metadata to be available.
>
> * When a metadata server goes offline and comes back up (after a 
> maintenance reboot, for example), the non-shared metadata disks will 
> be stopped. Until those are brought back into a  well-known replicated 
> state, you are at risk of a cluster-wide filesystem unmount if there 
> is a subsequent metadata disk failure. But GPFS will continue to work, 
> by default, allowing reads and writes against the remaining metadata 
> replica. You must detect that disks are stopped (e.g. mmlsdisk) and 
> restart them (e.g. with mmchdisk <fs> start ?a).
>
> I haven't seen anyone "recommend" running non-shared disk like this, 
> and I wouldn't do this for things which can't afford to go offline 
> unexpectedly and require a little more operational attention. But it 
> does appear to work.
>
> Thx
> Paul Sanchez
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di 
> Nardo
> *Sent:* Thursday, October 09, 2014 8:03 AM
> *To:* gpfsug main discussion list
> *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable?
>
> Hello everyone,
>
> Suppose we want to build a new GPFS storage using SAN attached 
> storages, but instead to put metadata in a shared storage, we want to 
> use  FusionIO PCI cards locally on the servers to speed up metadata 
> operation( http://www.fusionio.com/products/iodrive) and for 
> reliability, replicate the metadata in all the servers, will this work 
> in case of  server failure?
>
> To make it more clear: If a server fail i will loose also a metadata 
> vdisk. Its the replica mechanism its reliable enough to avoid metadata 
> corruption and loss of data?
>
> Thanks in advance
> Salvatore Di Nardo
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/8d4ab475/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lhorrocks-barlow.vcf
Type: text/x-vcard
Size: 388 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/8d4ab475/attachment-0001.vcf>

From kraemerf at de.ibm.com  Mon Oct 13 12:10:17 2014
From: kraemerf at de.ibm.com (Frank Kraemer)
Date: Mon, 13 Oct 2014 13:10:17 +0200
Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014,
	Duesseldorf, Germany
Message-ID: <OF53876A5D.2ACC8EFA-ONC1257D70.003D5DC4-C1257D70.003D5DC7@de.ibm.com>


GPFS at  LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany
Oct 14th 11:15-12:05 Room 18
http://sched.co/1uMYEWK

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Hechtsheimer Str. 2, 55131 Mainz
mailto:kraemerf at de.ibm.com
voice: +49171-3043699
IBM Germany


From service at metamodul.com  Mon Oct 13 16:49:44 2014
From: service at metamodul.com (service at metamodul.com)
Date: Mon, 13 Oct 2014 17:49:44 +0200 (CEST)
Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014,
 Duesseldorf, Germany
In-Reply-To: <OF53876A5D.2ACC8EFA-ONC1257D70.003D5DC4-C1257D70.003D5DC7@de.ibm.com>
References: <OF53876A5D.2ACC8EFA-ONC1257D70.003D5DC4-C1257D70.003D5DC7@de.ibm.com>
Message-ID: <994787708.574787.1413215384447.JavaMail.open-xchange@oxbaltgw12.schlund.de>

Hallo Frank,
the announcement is a little bit to late for me. Would be nice if you could
share your speech later.
 
cheers
Hajo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141013/cf4b67b2/attachment-0001.htm>

From sdinardo at ebi.ac.uk  Tue Oct 14 15:39:35 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 14 Oct 2014 15:39:35 +0100
Subject: [gpfsug-discuss] wait for permission to append to log
Message-ID: <543D35A7.7080800@ebi.ac.uk>

hello all,
could someone explain me the meaning of those waiters?

gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'

Does it means that the vdisk logs are struggling?

Regards,
Salvatore


From oehmes at us.ibm.com  Tue Oct 14 15:51:10 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 07:51:10 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <543D35A7.7080800@ebi.ac.uk>
References: <543D35A7.7080800@ebi.ac.uk>
Message-ID: <OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>

it means there is contention on inserting data into the fast write log on 
the GSS Node, which could be config or workload related
what GSS code version are you running and how are the nodes connected with 
each other (Ethernet or IB) ? 

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Salvatore Di Nardo <sdinardo at ebi.ac.uk>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/14/2014 07:40 AM
Subject:        [gpfsug-discuss] wait for permission to append to log
Sent by:        gpfsug-discuss-bounces at gpfsug.org


hello all,
could someone explain me the meaning of those waiters?

gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'

Does it means that the vdisk logs are struggling?

Regards,
Salvatore

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/63d72890/attachment-0001.htm>

From sdinardo at ebi.ac.uk  Tue Oct 14 16:23:01 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 14 Oct 2014 16:23:01 +0100
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
References: <543D35A7.7080800@ebi.ac.uk>
	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
Message-ID: <543D3FD5.1060705@ebi.ac.uk>


On 14/10/14 15:51, Sven Oehme wrote:
> it means there is contention on inserting data into the fast write log 
> on the GSS Node, which could be config or workload related
> what GSS code version are you running 

            [root at ebi5-251 ~]# mmdiag --version

            === mmdiag: version ===
            Current GPFS build: "3.5.0-11 efix1 (888041)".
            Built on Jul  9 2013 at 18:03:32
            Running 6 days 2 hours 10 minutes 35 secs


> and how are the nodes connected with each other (Ethernet or IB) ?
ethernet. they use the same bonding (4x10Gb/s) where the data is 
passing. We don't have admin dedicated network


            [root at gss03a ~]# mmlscluster

            GPFS cluster information
            ========================
               GPFS cluster name:         GSS.ebi.ac.uk
               GPFS cluster id:           17987981184946329605
               GPFS UID domain:           GSS.ebi.ac.uk
               Remote shell command:      /usr/bin/ssh
               Remote file copy command:  /usr/bin/scp

            GPFS cluster configuration servers:
            -----------------------------------
               Primary server:    gss01a.ebi.ac.uk
               Secondary server:  gss02b.ebi.ac.uk

              Node  Daemon node name    IP address  Admin node name
            Designation
            -----------------------------------------------------------------------
                1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk
            quorum-manager
                2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk
            quorum-manager
                3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk
            quorum-manager
                4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk
            quorum-manager
                5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk
            quorum-manager
                6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk
            quorum-manager


*Note:* The 3 node "pairs" (gss01, gss02 and gss03)  are in different 
subnet because of datacenter constraints ( They are not physically in 
the same row, and due to network constraints was not possible to put 
them in the same subnet). The packets are routed, but should not be a 
problem as there is 160Gb/s bandwidth between them.

Regards,
Salvatore


> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
>
>
> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
> Date: 10/14/2014 07:40 AM
> Subject: [gpfsug-discuss] wait for permission to append to log
> Sent by: gpfsug-discuss-bounces at gpfsug.org
> ------------------------------------------------------------------------
>
>
>
> hello all,
> could someone explain me the meaning of those waiters?
>
> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>
> Does it means that the vdisk logs are struggling?
>
> Regards,
> Salvatore
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/36a5bc7e/attachment-0001.htm>

From oehmes at us.ibm.com  Tue Oct 14 17:22:41 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 09:22:41 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <543D3FD5.1060705@ebi.ac.uk>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>
Message-ID: <OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>

your GSS code version is very backlevel. 

can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk 
as well as mmlsconfig and mmlsfs all

thx. Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Salvatore Di Nardo <sdinardo at ebi.ac.uk>
To:     gpfsug-discuss at gpfsug.org
Date:   10/14/2014 08:23 AM
Subject:        Re: [gpfsug-discuss] wait for permission to append to log
Sent by:        gpfsug-discuss-bounces at gpfsug.org


On 14/10/14 15:51, Sven Oehme wrote:
it means there is contention on inserting data into the fast write log on 
the GSS Node, which could be config or workload related 
what GSS code version are you running 
[root at ebi5-251 ~]# mmdiag --version

=== mmdiag: version ===
Current GPFS build: "3.5.0-11 efix1 (888041)".
Built on Jul  9 2013 at 18:03:32
Running 6 days 2 hours 10 minutes 35 secs


and how are the nodes connected with each other (Ethernet or IB) ? 
ethernet. they use the same bonding (4x10Gb/s) where the data is passing. 
We don't have admin dedicated network

[root at gss03a ~]# mmlscluster 

GPFS cluster information
========================
  GPFS cluster name:         GSS.ebi.ac.uk
  GPFS cluster id:           17987981184946329605
  GPFS UID domain:           GSS.ebi.ac.uk
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp

GPFS cluster configuration servers:
-----------------------------------
  Primary server:    gss01a.ebi.ac.uk
  Secondary server:  gss02b.ebi.ac.uk

 Node  Daemon node name    IP address  Admin node name     Designation
-----------------------------------------------------------------------
   1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk    quorum-manager
   2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk    quorum-manager
   3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk    quorum-manager
   4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk    quorum-manager
   5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk    quorum-manager
   6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk    quorum-manager


Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different subnet 
because of datacenter constraints ( They are not physically in the same 
row, and due to network constraints was not possible to put them in the 
same subnet). The packets are routed, but should not be a problem as there 
is 160Gb/s bandwidth between them.

Regards,
Salvatore


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/14/2014 07:40 AM 
Subject:        [gpfsug-discuss] wait for permission to append to log 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


hello all,
could someone explain me the meaning of those waiters?

gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'

Does it means that the vdisk logs are struggling?

Regards,
Salvatore

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/a578f87a/attachment-0001.htm>

From sdinardo at ebi.ac.uk  Tue Oct 14 17:39:18 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 14 Oct 2014 17:39:18 +0100
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>	<543D3FD5.1060705@ebi.ac.uk>
	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
Message-ID: <543D51B6.3070602@ebi.ac.uk>

Thanks in advance for your help.

We have 6 RG:

              recovery group        vdisks     vdisks servers
              ------------------  -----------  ------  -------
              gss01a                        4       8
            gss01a.ebi.ac.uk,gss01b.ebi.ac.uk
              gss01b                        4       8
            gss01b.ebi.ac.uk,gss01a.ebi.ac.uk
              gss02a                        4       8
            gss02a.ebi.ac.uk,gss02b.ebi.ac.uk
              gss02b                        4       8
            gss02b.ebi.ac.uk,gss02a.ebi.ac.uk
              gss03a                        4       8
            gss03a.ebi.ac.uk,gss03b.ebi.ac.uk
              gss03b                        4       8
            gss03b.ebi.ac.uk,gss03a.ebi.ac.uk


Check the attached file for RG details.
Following mmlsconfig:

            [root at gss01a ~]# mmlsconfig
            Configuration data for cluster GSS.ebi.ac.uk:
            ---------------------------------------------
            myNodeConfigNumber 1
            clusterName GSS.ebi.ac.uk
            clusterId 17987981184946329605
            autoload no
            dmapiFileHandleSize 32
            minReleaseLevel 3.5.0.11
            [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
            pagepool 38g
            nsdRAIDBufferPoolSizePct 80
            maxBufferDescs 2m
            numaMemoryInterleave yes
            prefetchPct 5
            maxblocksize 16m
            nsdRAIDTracks 128k
            ioHistorySize 64k
            nsdRAIDSmallBufferSize 256k
            nsdMaxWorkerThreads 3k
            nsdMinWorkerThreads 3k
            nsdRAIDSmallThreadRatio 2
            nsdRAIDThreadsPerQueue 16
            nsdClientCksumTypeLocal ck64
            nsdClientCksumTypeRemote ck64
            nsdRAIDEventLogToConsole all
            nsdRAIDFastWriteFSDataLimit 64k
            nsdRAIDFastWriteFSMetadataLimit 256k
            nsdRAIDReconstructAggressiveness 1
            nsdRAIDFlusherBuffersLowWatermarkPct 20
            nsdRAIDFlusherBuffersLimitPct 80
            nsdRAIDFlusherTracksLowWatermarkPct 20
            nsdRAIDFlusherTracksLimitPct 80
            nsdRAIDFlusherFWLogHighWatermarkMB 1000
            nsdRAIDFlusherFWLogLimitMB 5000
            nsdRAIDFlusherThreadsLowWatermark 1
            nsdRAIDFlusherThreadsHighWatermark 512
            nsdRAIDBlockDeviceMaxSectorsKB 4096
            nsdRAIDBlockDeviceNrRequests 32
            nsdRAIDBlockDeviceQueueDepth 16
            nsdRAIDBlockDeviceScheduler deadline
            nsdRAIDMaxTransientStale2FT 1
            nsdRAIDMaxTransientStale3FT 1
            syncWorkerThreads 256
            tscWorkerPool 64
            nsdInlineWriteMax 32k
            maxFilesToCache 12k
            maxStatCache 512
            maxGeneralThreads 1280
            flushedDataTarget 1024
            flushedInodeTarget 1024
            maxFileCleaners 1024
            maxBufferCleaners 1024
            logBufferCount 20
            logWrapAmountPct 2
            logWrapThreads 128
            maxAllocRegionsPerNode 32
            maxBackgroundDeletionThreads 16
            maxInodeDeallocPrefetch 128
            maxMBpS 16000
            maxReceiverThreads 128
            worker1Threads 1024
            worker3Threads 32
            [common]
            cipherList AUTHONLY
            socketMaxListenConnections 1500
            failureDetectionTime 60
            [common]
            adminMode central

            File systems in cluster GSS.ebi.ac.uk:
            --------------------------------------
            /dev/gpfs1

For more configuration paramenters i also attached a file with the 
complete output of mmdiag --config.


and mmlsfs:


            File system attributes for /dev/gpfs1:
            ======================================
            flag                value                    description
            ------------------- ------------------------
            -----------------------------------
              -f                 32768                    Minimum
            fragment size in bytes (system pool)
                                 262144                   Minimum
            fragment size in bytes (other pools)
              -i                 512                      Inode size in
            bytes
              -I                 32768                    Indirect block
            size in bytes
              -m                 2                        Default number
            of metadata replicas
              -M                 2                        Maximum number
            of metadata replicas
              -r                 1                        Default number
            of data replicas
              -R                 2                        Maximum number
            of data replicas
              -j                 scatter                  Block
            allocation type
              -D                 nfs4                     File locking
            semantics in effect
              -k                 all                      ACL semantics
            in effect
              -n                 1000                     Estimated
            number of nodes that will mount file system
              -B                 1048576                  Block size
            (system pool)
                                 8388608                  Block size
            (other pools)
              -Q                 user;group;fileset       Quotas enforced
                                 user;group;fileset       Default quotas
            enabled
              --filesetdf        no                       Fileset df
            enabled?
              -V                 13.23 (3.5.0.7)          File system
            version
              --create-time      Tue Mar 18 16:01:24 2014 File system
            creation time
              -u                 yes                      Support for
            large LUNs?
              -z                 no                       Is DMAPI enabled?
              -L                 4194304                  Logfile size
              -E                 yes                      Exact mtime
            mount option
              -S                 yes                      Suppress atime
            mount option
              -K                 whenpossible             Strict replica
            allocation option
              --fastea           yes                      Fast external
            attributes enabled?
              --inode-limit      134217728                Maximum number
            of inodes
              -P                 system;data              Disk storage
            pools in file system
              -d
            gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
              -d
            gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
              -d
            gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
              -d
            gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
              -d
            gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3
            Disks in file system
              --perfileset-quota no                       Per-fileset
            quota enforcement
              -A                 yes                      Automatic
            mount option
              -o                 none                     Additional
            mount options
              -T                 /gpfs1                   Default mount
            point
              --mount-priority   0                        Mount priority


Regards,
Salvatore


On 14/10/14 17:22, Sven Oehme wrote:
> your GSS code version is very backlevel.
>
> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk
> as well as mmlsconfig and mmlsfs all
>
> thx. Sven
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
>
>
> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
> To: gpfsug-discuss at gpfsug.org
> Date: 10/14/2014 08:23 AM
> Subject: Re: [gpfsug-discuss] wait for permission to append to log
> Sent by: gpfsug-discuss-bounces at gpfsug.org
> ------------------------------------------------------------------------
>
>
>
>
> On 14/10/14 15:51, Sven Oehme wrote:
> it means there is contention on inserting data into the fast write log 
> on the GSS Node, which could be config or workload related
> what GSS code version are you running
> [root at ebi5-251 ~]# mmdiag --version
>
> === mmdiag: version ===
> Current GPFS build: "3.5.0-11 efix1 (888041)".
> Built on Jul  9 2013 at 18:03:32
> Running 6 days 2 hours 10 minutes 35 secs
>
>
>
> and how are the nodes connected with each other (Ethernet or IB) ?
> ethernet. they use the same bonding (4x10Gb/s) where the data is 
> passing. We don't have admin dedicated network
>
> [root at gss03a ~]# mmlscluster
>
> GPFS cluster information
> ========================
>   GPFS cluster name: GSS.ebi.ac.uk
>   GPFS cluster id: 17987981184946329605
>   GPFS UID domain: GSS.ebi.ac.uk
>   Remote shell command:      /usr/bin/ssh
>   Remote file copy command:  /usr/bin/scp
>
> GPFS cluster configuration servers:
> -----------------------------------
>   Primary server:    gss01a.ebi.ac.uk
>   Secondary server:  gss02b.ebi.ac.uk
>
>  Node  Daemon node name    IP address  Admin node name     Designation
> -----------------------------------------------------------------------
>    1   gss01a.ebi.ac.uk    10.7.28.2 gss01a.ebi.ac.uk    quorum-manager
>    2   gss01b.ebi.ac.uk    10.7.28.3 gss01b.ebi.ac.uk    quorum-manager
>    3   gss02a.ebi.ac.uk    10.7.28.67 gss02a.ebi.ac.uk    quorum-manager
>    4   gss02b.ebi.ac.uk    10.7.28.66 gss02b.ebi.ac.uk    quorum-manager
>    5   gss03a.ebi.ac.uk    10.7.28.34 gss03a.ebi.ac.uk    quorum-manager
>    6   gss03b.ebi.ac.uk    10.7.28.35 gss03b.ebi.ac.uk    quorum-manager
>
>
> *Note:* The 3 node "pairs" (gss01, gss02 and gss03)  are in different 
> subnet because of datacenter constraints ( They are not physically in 
> the same row, and due to network constraints was not possible to put 
> them in the same subnet). The packets are routed, but should not be a 
> problem as there is 160Gb/s bandwidth between them.
>
> Regards,
> Salvatore
>
>
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: _oehmes at us.ibm.com_ <mailto:oehmes at us.ibm.com>
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
>
>
> From: Salvatore Di Nardo _<sdinardo at ebi.ac.uk>_ 
> <mailto:sdinardo at ebi.ac.uk>
> To: gpfsug main discussion list _<gpfsug-discuss at gpfsug.org>_ 
> <mailto:gpfsug-discuss at gpfsug.org>
> Date: 10/14/2014 07:40 AM
> Subject: [gpfsug-discuss] wait for permission to append to log
> Sent by: _gpfsug-discuss-bounces at gpfsug.org_ 
> <mailto:gpfsug-discuss-bounces at gpfsug.org>
> ------------------------------------------------------------------------
>
>
>
> hello all,
> could someone explain me the meaning of those waiters?
>
> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>
> Does it means that the vdisk logs are struggling?
>
> Regards,
> Salvatore
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org_
> __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/349c39d3/attachment-0001.htm>
-------------- next part --------------

                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss01a                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       42%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub        4%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub        4%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       59%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s01                 2        DA3             110 GiB  ok                
 e1d1s02                 2        DA2             110 GiB  ok                
 e1d1s03log              2        LOG             186 GiB  ok                
 e1d1s04                 2        DA1             108 GiB  ok                
 e1d1s05                 2        DA2             110 GiB  ok                
 e1d1s06                 2        DA3             110 GiB  ok                
 e1d2s01                 2        DA1             108 GiB  ok                
 e1d2s02                 2        DA2             110 GiB  ok                
 e1d2s03                 2        DA3             110 GiB  ok                
 e1d2s04                 2        DA1             108 GiB  ok                
 e1d2s05                 2        DA2             110 GiB  ok                
 e1d2s06                 2        DA3             110 GiB  ok                
 e1d3s01                 2        DA1             108 GiB  ok                
 e1d3s02                 2        DA2             110 GiB  ok                
 e1d3s03                 2        DA3             110 GiB  ok                
 e1d3s04                 2        DA1             108 GiB  ok                
 e1d3s05                 2        DA2             110 GiB  ok                
 e1d3s06                 2        DA3             110 GiB  ok                
 e1d4s01                 2        DA1             108 GiB  ok                
 e1d4s02                 2        DA2             110 GiB  ok                
 e1d4s03                 2        DA3             110 GiB  ok                
 e1d4s04                 2        DA1             108 GiB  ok                
 e1d4s05                 2        DA2             110 GiB  ok                
 e1d4s06                 2        DA3             110 GiB  ok                
 e1d5s01                 2        DA1             108 GiB  ok                
 e1d5s02                 2        DA2             110 GiB  ok                
 e1d5s03                 2        DA3             110 GiB  ok                
 e1d5s04                 2        DA1             108 GiB  ok                
 e1d5s05                 2        DA2             110 GiB  ok                
 e1d5s06                 2        DA3             110 GiB  ok                
 e2d1s01                 2        DA3             110 GiB  ok                
 e2d1s02                 2        DA2             110 GiB  ok                
 e2d1s03log              2        LOG             186 GiB  ok                
 e2d1s04                 2        DA1             108 GiB  ok                
 e2d1s05                 2        DA2             110 GiB  ok                
 e2d1s06                 2        DA3             110 GiB  ok                
 e2d2s01                 2        DA1             108 GiB  ok                
 e2d2s02                 2        DA2             110 GiB  ok                
 e2d2s03                 2        DA3             110 GiB  ok                
 e2d2s04                 2        DA1             108 GiB  ok                
 e2d2s05                 2        DA2             110 GiB  ok                
 e2d2s06                 2        DA3             110 GiB  ok                
 e2d3s01                 2        DA1             108 GiB  ok                
 e2d3s02                 2        DA2             110 GiB  ok                
 e2d3s03                 2        DA3             110 GiB  ok                
 e2d3s04                 2        DA1             108 GiB  ok                
 e2d3s05                 2        DA2             110 GiB  ok                
 e2d3s06                 2        DA3             110 GiB  ok                
 e2d4s01                 2        DA1             108 GiB  ok                
 e2d4s02                 2        DA2             110 GiB  ok                
 e2d4s03                 2        DA3             110 GiB  ok                
 e2d4s04                 2        DA1             108 GiB  ok                
 e2d4s05                 2        DA2             110 GiB  ok                
 e2d4s06                 2        DA3             110 GiB  ok                
 e2d5s01                 2        DA1             108 GiB  ok                
 e2d5s02                 2        DA2             110 GiB  ok                
 e2d5s03                 2        DA3             110 GiB  ok                
 e2d5s04                 2        DA1             108 GiB  ok                
 e2d5s05                 2        DA2             110 GiB  ok                
 e2d5s06                 2        DA3             110 GiB  ok                
 e3d1s01                 2        DA1             108 GiB  ok                
 e3d1s02                 2        DA3             110 GiB  ok                
 e3d1s03log              2        LOG             186 GiB  ok                
 e3d1s04                 2        DA1             108 GiB  ok                
 e3d1s05                 2        DA2             110 GiB  ok                
 e3d1s06                 2        DA3             110 GiB  ok                
 e3d2s01                 2        DA1             108 GiB  ok                
 e3d2s02                 2        DA2             110 GiB  ok                
 e3d2s03                 2        DA3             110 GiB  ok                
 e3d2s04                 2        DA1             108 GiB  ok                
 e3d2s05                 2        DA2             110 GiB  ok                
 e3d2s06                 2        DA3             110 GiB  ok                
 e3d3s01                 2        DA1             108 GiB  ok                
 e3d3s02                 2        DA2             110 GiB  ok                
 e3d3s03                 2        DA3             110 GiB  ok                
 e3d3s04                 2        DA1             108 GiB  ok                
 e3d3s05                 2        DA2             110 GiB  ok                
 e3d3s06                 2        DA3             110 GiB  ok                
 e3d4s01                 2        DA1             108 GiB  ok                
 e3d4s02                 2        DA2             110 GiB  ok                
 e3d4s03                 2        DA3             110 GiB  ok                
 e3d4s04                 2        DA1             108 GiB  ok                
 e3d4s05                 2        DA2             110 GiB  ok                
 e3d4s06                 2        DA3             110 GiB  ok                
 e3d5s01                 2        DA1             108 GiB  ok                
 e3d5s02                 2        DA2             110 GiB  ok                
 e3d5s03                 2        DA3             110 GiB  ok                
 e3d5s04                 2        DA1             108 GiB  ok                
 e3d5s05                 2        DA2             110 GiB  ok                
 e3d5s06                 2        DA3             110 GiB  ok                
 e4d1s01                 2        DA1             108 GiB  ok                
 e4d1s02                 2        DA3             110 GiB  ok                
 e4d1s04                 2        DA1             108 GiB  ok                
 e4d1s05                 2        DA2             110 GiB  ok                
 e4d1s06                 2        DA3             110 GiB  ok                
 e4d2s01                 2        DA1             108 GiB  ok                
 e4d2s02                 2        DA2             110 GiB  ok                
 e4d2s03                 2        DA3             110 GiB  ok                
 e4d2s04                 2        DA1             106 GiB  ok                
 e4d2s05                 2        DA2             110 GiB  ok                
 e4d2s06                 2        DA3             110 GiB  ok                
 e4d3s01                 2        DA1             106 GiB  ok                
 e4d3s02                 2        DA2             110 GiB  ok                
 e4d3s03                 2        DA3             110 GiB  ok                
 e4d3s04                 2        DA1             106 GiB  ok                
 e4d3s05                 2        DA2             110 GiB  ok                
 e4d3s06                 2        DA3             110 GiB  ok                
 e4d4s01                 2        DA1             106 GiB  ok                
 e4d4s02                 2        DA2             110 GiB  ok                
 e4d4s03                 2        DA3             110 GiB  ok                
 e4d4s04                 2        DA1             106 GiB  ok                
 e4d4s05                 2        DA2             110 GiB  ok                
 e4d4s06                 2        DA3             110 GiB  ok                
 e4d5s01                 2        DA1             106 GiB  ok                
 e4d5s02                 2        DA2             110 GiB  ok                
 e4d5s03                 2        DA3             110 GiB  ok                
 e4d5s04                 2        DA1             106 GiB  ok                
 e4d5s05                 2        DA2             110 GiB  ok                
 e4d5s06                 2        DA3             110 GiB  ok                
 e5d1s01                 2        DA1             106 GiB  ok                
 e5d1s02                 2        DA2             110 GiB  ok                
 e5d1s04                 2        DA1             106 GiB  ok                
 e5d1s05                 2        DA2             110 GiB  ok                
 e5d1s06                 2        DA3             110 GiB  ok                
 e5d2s01                 2        DA1             106 GiB  ok                
 e5d2s02                 2        DA2             110 GiB  ok                
 e5d2s03                 2        DA3             110 GiB  ok                
 e5d2s04                 2        DA1             106 GiB  ok                
 e5d2s05                 2        DA2             110 GiB  ok                
 e5d2s06                 2        DA3             110 GiB  ok                
 e5d3s01                 2        DA1             106 GiB  ok                
 e5d3s02                 2        DA2             110 GiB  ok                
 e5d3s03                 2        DA3             110 GiB  ok                
 e5d3s04                 2        DA1             106 GiB  ok                
 e5d3s05                 2        DA2             110 GiB  ok                
 e5d3s06                 2        DA3             110 GiB  ok                
 e5d4s01                 2        DA1             106 GiB  ok                
 e5d4s02                 2        DA2             110 GiB  ok                
 e5d4s03                 2        DA3             110 GiB  ok                
 e5d4s04                 2        DA1             106 GiB  ok                
 e5d4s05                 2        DA2             110 GiB  ok                
 e5d4s06                 2        DA3             110 GiB  ok                
 e5d5s01                 2        DA1             106 GiB  ok                
 e5d5s02                 2        DA2             110 GiB  ok                
 e5d5s03                 2        DA3             110 GiB  ok                
 e5d5s04                 2        DA1             106 GiB  ok                
 e5d5s05                 2        DA2             110 GiB  ok                
 e5d5s06                 2        DA3             110 GiB  ok                
 e6d1s01                 2        DA1             106 GiB  ok                
 e6d1s02                 2        DA2             110 GiB  ok                
 e6d1s04                 2        DA1             106 GiB  ok                
 e6d1s05                 2        DA2             110 GiB  ok                
 e6d1s06                 2        DA3             110 GiB  ok                
 e6d2s01                 2        DA1             106 GiB  ok                
 e6d2s02                 2        DA2             110 GiB  ok                
 e6d2s03                 2        DA3             110 GiB  ok                
 e6d2s04                 2        DA1             106 GiB  ok                
 e6d2s05                 2        DA2             110 GiB  ok                
 e6d2s06                 2        DA3             110 GiB  ok                
 e6d3s01                 2        DA1             106 GiB  ok                
 e6d3s02                 2        DA2             110 GiB  ok                
 e6d3s03                 2        DA3             110 GiB  ok                
 e6d3s04                 2        DA1             106 GiB  ok                
 e6d3s05                 2        DA2             108 GiB  ok                
 e6d3s06                 2        DA3             108 GiB  ok                
 e6d4s01                 2        DA1             106 GiB  ok                
 e6d4s02                 2        DA2             108 GiB  ok                
 e6d4s03                 2        DA3             108 GiB  ok                
 e6d4s04                 2        DA1             106 GiB  ok                
 e6d4s05                 2        DA2             108 GiB  ok                
 e6d4s06                 2        DA3             108 GiB  ok                
 e6d5s01                 2        DA1             106 GiB  ok                
 e6d5s02                 2        DA2             108 GiB  ok                
 e6d5s03                 2        DA3             108 GiB  ok                
 e6d5s04                 2        DA1             106 GiB  ok                
 e6d5s05                 2        DA2             108 GiB  ok                
 e6d5s06                 2        DA3             108 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss01a_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss01a_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss01a_MetaData_8M_3p_1  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss01a_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss01a_MetaData_8M_3p_3  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss01a_Data_8M_3p_1  8+3p                DA3              99 TiB      8 MiB     32 KiB             
 gss01a_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss01a_Data_8M_3p_3  8+3p                DA1              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss01a.ebi.ac.uk                                 gss01a.ebi.ac.uk,gss01b.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss01b                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       36%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       61%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       68%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       70%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s07                 2        DA1             108 GiB  ok                
 e1d1s08                 2        DA2             110 GiB  ok                
 e1d1s09                 2        DA3             110 GiB  ok                
 e1d1s10                 2        DA1             108 GiB  ok                
 e1d1s11                 2        DA2             110 GiB  ok                
 e1d1s12                 2        DA3             110 GiB  ok                
 e1d2s07                 2        DA1             108 GiB  ok                
 e1d2s08                 2        DA2             110 GiB  ok                
 e1d2s09                 2        DA3             110 GiB  ok                
 e1d2s10                 2        DA1             108 GiB  ok                
 e1d2s11                 2        DA2             110 GiB  ok                
 e1d2s12                 2        DA3             110 GiB  ok                
 e1d3s07                 2        DA1             108 GiB  ok                
 e1d3s08                 2        DA2             110 GiB  ok                
 e1d3s09                 2        DA3             110 GiB  ok                
 e1d3s10                 2        DA1             108 GiB  ok                
 e1d3s11                 2        DA2             110 GiB  ok                
 e1d3s12                 2        DA3             110 GiB  ok                
 e1d4s07                 2        DA1             108 GiB  ok                
 e1d4s08                 2        DA2             110 GiB  ok                
 e1d4s09                 2        DA3             110 GiB  ok                
 e1d4s10                 2        DA1             108 GiB  ok                
 e1d4s11                 2        DA2             110 GiB  ok                
 e1d4s12                 2        DA3             110 GiB  ok                
 e1d5s07                 2        DA1             108 GiB  ok                
 e1d5s08                 2        DA2             110 GiB  ok                
 e1d5s09                 2        DA3             110 GiB  ok                
 e1d5s10                 2        DA3             110 GiB  ok                
 e1d5s11                 2        DA2             110 GiB  ok                
 e1d5s12log              2        LOG             186 GiB  ok                
 e2d1s07                 2        DA1             106 GiB  ok                
 e2d1s08                 2        DA2             110 GiB  ok                
 e2d1s09                 2        DA3             110 GiB  ok                
 e2d1s10                 2        DA1             108 GiB  ok                
 e2d1s11                 2        DA2             110 GiB  ok                
 e2d1s12                 2        DA3             110 GiB  ok                
 e2d2s07                 2        DA1             108 GiB  ok                
 e2d2s08                 2        DA2             110 GiB  ok                
 e2d2s09                 2        DA3             110 GiB  ok                
 e2d2s10                 2        DA1             108 GiB  ok                
 e2d2s11                 2        DA2             110 GiB  ok                
 e2d2s12                 2        DA3             110 GiB  ok                
 e2d3s07                 2        DA1             108 GiB  ok                
 e2d3s08                 2        DA2             110 GiB  ok                
 e2d3s09                 2        DA3             110 GiB  ok                
 e2d3s10                 2        DA1             108 GiB  ok                
 e2d3s11                 2        DA2             110 GiB  ok                
 e2d3s12                 2        DA3             110 GiB  ok                
 e2d4s07                 2        DA1             108 GiB  ok                
 e2d4s08                 2        DA2             110 GiB  ok                
 e2d4s09                 2        DA3             110 GiB  ok                
 e2d4s10                 2        DA1             108 GiB  ok                
 e2d4s11                 2        DA2             110 GiB  ok                
 e2d4s12                 2        DA3             110 GiB  ok                
 e2d5s07                 2        DA1             108 GiB  ok                
 e2d5s08                 2        DA2             110 GiB  ok                
 e2d5s09                 2        DA3             110 GiB  ok                
 e2d5s10                 2        DA3             110 GiB  ok                
 e2d5s11                 2        DA2             110 GiB  ok                
 e2d5s12log              2        LOG             186 GiB  ok                
 e3d1s07                 2        DA1             108 GiB  ok                
 e3d1s08                 2        DA2             110 GiB  ok                
 e3d1s09                 2        DA3             110 GiB  ok                
 e3d1s10                 2        DA1             108 GiB  ok                
 e3d1s11                 2        DA2             110 GiB  ok                
 e3d1s12                 2        DA3             110 GiB  ok                
 e3d2s07                 2        DA1             108 GiB  ok                
 e3d2s08                 2        DA2             110 GiB  ok                
 e3d2s09                 2        DA3             110 GiB  ok                
 e3d2s10                 2        DA1             108 GiB  ok                
 e3d2s11                 2        DA2             110 GiB  ok                
 e3d2s12                 2        DA3             110 GiB  ok                
 e3d3s07                 2        DA1             108 GiB  ok                
 e3d3s08                 2        DA2             110 GiB  ok                
 e3d3s09                 2        DA3             110 GiB  ok                
 e3d3s10                 2        DA1             108 GiB  ok                
 e3d3s11                 2        DA2             110 GiB  ok                
 e3d3s12                 2        DA3             110 GiB  ok                
 e3d4s07                 2        DA1             108 GiB  ok                
 e3d4s08                 2        DA2             110 GiB  ok                
 e3d4s09                 2        DA3             110 GiB  ok                
 e3d4s10                 2        DA1             108 GiB  ok                
 e3d4s11                 2        DA2             110 GiB  ok                
 e3d4s12                 2        DA3             110 GiB  ok                
 e3d5s07                 2        DA1             108 GiB  ok                
 e3d5s08                 2        DA2             110 GiB  ok                
 e3d5s09                 2        DA3             110 GiB  ok                
 e3d5s10                 2        DA1             108 GiB  ok                
 e3d5s11                 2        DA3             110 GiB  ok                
 e3d5s12log              2        LOG             186 GiB  ok                
 e4d1s07                 2        DA1             108 GiB  ok                
 e4d1s08                 2        DA2             110 GiB  ok                
 e4d1s09                 2        DA3             110 GiB  ok                
 e4d1s10                 2        DA1             108 GiB  ok                
 e4d1s11                 2        DA2             110 GiB  ok                
 e4d1s12                 2        DA3             110 GiB  ok                
 e4d2s07                 2        DA1             108 GiB  ok                
 e4d2s08                 2        DA2             110 GiB  ok                
 e4d2s09                 2        DA3             110 GiB  ok                
 e4d2s10                 2        DA1             108 GiB  ok                
 e4d2s11                 2        DA2             110 GiB  ok                
 e4d2s12                 2        DA3             110 GiB  ok                
 e4d3s07                 2        DA1             106 GiB  ok                
 e4d3s08                 2        DA2             110 GiB  ok                
 e4d3s09                 2        DA3             110 GiB  ok                
 e4d3s10                 2        DA1             106 GiB  ok                
 e4d3s11                 2        DA2             110 GiB  ok                
 e4d3s12                 2        DA3             110 GiB  ok                
 e4d4s07                 2        DA1             106 GiB  ok                
 e4d4s08                 2        DA2             110 GiB  ok                
 e4d4s09                 2        DA3             110 GiB  ok                
 e4d4s10                 2        DA1             106 GiB  ok                
 e4d4s11                 2        DA2             110 GiB  ok                
 e4d4s12                 2        DA3             110 GiB  ok                
 e4d5s07                 2        DA1             106 GiB  ok                
 e4d5s08                 2        DA2             110 GiB  ok                
 e4d5s09                 2        DA3             110 GiB  ok                
 e4d5s10                 2        DA1             106 GiB  ok                
 e4d5s11                 2        DA3             110 GiB  ok                
 e5d1s07                 2        DA1             106 GiB  ok                
 e5d1s08                 2        DA2             110 GiB  ok                
 e5d1s09                 2        DA3             110 GiB  ok                
 e5d1s10                 2        DA1             106 GiB  ok                
 e5d1s11                 2        DA2             110 GiB  ok                
 e5d1s12                 2        DA3             110 GiB  ok                
 e5d2s07                 2        DA1             106 GiB  ok                
 e5d2s08                 2        DA2             110 GiB  ok                
 e5d2s09                 2        DA3             110 GiB  ok                
 e5d2s10                 2        DA1             106 GiB  ok                
 e5d2s11                 2        DA2             110 GiB  ok                
 e5d2s12                 2        DA3             110 GiB  ok                
 e5d3s07                 2        DA1             106 GiB  ok                
 e5d3s08                 2        DA2             110 GiB  ok                
 e5d3s09                 2        DA3             110 GiB  ok                
 e5d3s10                 2        DA1             106 GiB  ok                
 e5d3s11                 2        DA2             110 GiB  ok                
 e5d3s12                 2        DA3             108 GiB  ok                
 e5d4s07                 2        DA1             106 GiB  ok                
 e5d4s08                 2        DA2             110 GiB  ok                
 e5d4s09                 2        DA3             110 GiB  ok                
 e5d4s10                 2        DA1             106 GiB  ok                
 e5d4s11                 2        DA2             110 GiB  ok                
 e5d4s12                 2        DA3             110 GiB  ok                
 e5d5s07                 2        DA1             106 GiB  ok                
 e5d5s08                 2        DA2             110 GiB  ok                
 e5d5s09                 2        DA3             110 GiB  ok                
 e5d5s10                 2        DA1             106 GiB  ok                
 e5d5s11                 2        DA2             110 GiB  ok                
 e6d1s07                 2        DA1             106 GiB  ok                
 e6d1s08                 2        DA2             110 GiB  ok                
 e6d1s09                 2        DA3             110 GiB  ok                
 e6d1s10                 2        DA1             106 GiB  ok                
 e6d1s11                 2        DA2             110 GiB  ok                
 e6d1s12                 2        DA3             110 GiB  ok                
 e6d2s07                 2        DA1             106 GiB  ok                
 e6d2s08                 2        DA2             110 GiB  ok                
 e6d2s09                 2        DA3             110 GiB  ok                
 e6d2s10                 2        DA1             106 GiB  ok                
 e6d2s11                 2        DA2             110 GiB  ok                
 e6d2s12                 2        DA3             110 GiB  ok                
 e6d3s07                 2        DA1             106 GiB  ok                
 e6d3s08                 2        DA2             108 GiB  ok                
 e6d3s09                 2        DA3             110 GiB  ok                
 e6d3s10                 2        DA1             106 GiB  ok                
 e6d3s11                 2        DA2             108 GiB  ok                
 e6d3s12                 2        DA3             108 GiB  ok                
 e6d4s07                 2        DA1             106 GiB  ok                
 e6d4s08                 2        DA2             108 GiB  ok                
 e6d4s09                 2        DA3             108 GiB  ok                
 e6d4s10                 2        DA1             106 GiB  ok                
 e6d4s11                 2        DA2             108 GiB  ok                
 e6d4s12                 2        DA3             108 GiB  ok                
 e6d5s07                 2        DA1             106 GiB  ok                
 e6d5s08                 2        DA2             110 GiB  ok                
 e6d5s09                 2        DA3             108 GiB  ok                
 e6d5s10                 2        DA1             106 GiB  ok                
 e6d5s11                 2        DA2             108 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss01b_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss01b_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss01b_MetaData_8M_3p_1  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss01b_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss01b_MetaData_8M_3p_3  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss01b_Data_8M_3p_1  8+3p                DA1              99 TiB      8 MiB     32 KiB             
 gss01b_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss01b_Data_8M_3p_3  8+3p                DA3              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss01b.ebi.ac.uk                                 gss01b.ebi.ac.uk,gss01a.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss02a                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       41%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub        8%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       14%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub        5%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s01                 2        DA3             110 GiB  ok                
 e1d1s02                 2        DA2             110 GiB  ok                
 e1d1s03log              2        LOG             186 GiB  ok                
 e1d1s04                 2        DA1             108 GiB  ok                
 e1d1s05                 2        DA2             110 GiB  ok                
 e1d1s06                 2        DA3             110 GiB  ok                
 e1d2s01                 2        DA1             108 GiB  ok                
 e1d2s02                 2        DA2             110 GiB  ok                
 e1d2s03                 2        DA3             110 GiB  ok                
 e1d2s04                 2        DA1             108 GiB  ok                
 e1d2s05                 2        DA2             110 GiB  ok                
 e1d2s06                 2        DA3             110 GiB  ok                
 e1d3s01                 2        DA1             108 GiB  ok                
 e1d3s02                 2        DA2             110 GiB  ok                
 e1d3s03                 2        DA3             110 GiB  ok                
 e1d3s04                 2        DA1             108 GiB  ok                
 e1d3s05                 2        DA2             110 GiB  ok                
 e1d3s06                 2        DA3             110 GiB  ok                
 e1d4s01                 2        DA1             108 GiB  ok                
 e1d4s02                 2        DA2             110 GiB  ok                
 e1d4s03                 2        DA3             110 GiB  ok                
 e1d4s04                 2        DA1             108 GiB  ok                
 e1d4s05                 2        DA2             110 GiB  ok                
 e1d4s06                 2        DA3             110 GiB  ok                
 e1d5s01                 2        DA1             108 GiB  ok                
 e1d5s02                 2        DA2             110 GiB  ok                
 e1d5s03                 2        DA3             110 GiB  ok                
 e1d5s04                 2        DA1             108 GiB  ok                
 e1d5s05                 2        DA2             110 GiB  ok                
 e1d5s06                 2        DA3             110 GiB  ok                
 e2d1s01                 2        DA3             110 GiB  ok                
 e2d1s02                 2        DA2             110 GiB  ok                
 e2d1s03log              2        LOG             186 GiB  ok                
 e2d1s04                 2        DA1             108 GiB  ok                
 e2d1s05                 2        DA2             110 GiB  ok                
 e2d1s06                 2        DA3             110 GiB  ok                
 e2d2s01                 2        DA1             106 GiB  ok                
 e2d2s02                 2        DA2             110 GiB  ok                
 e2d2s03                 2        DA3             110 GiB  ok                
 e2d2s04                 2        DA1             106 GiB  ok                
 e2d2s05                 2        DA2             110 GiB  ok                
 e2d2s06                 2        DA3             110 GiB  ok                
 e2d3s01                 2        DA1             106 GiB  ok                
 e2d3s02                 2        DA2             110 GiB  ok                
 e2d3s03                 2        DA3             110 GiB  ok                
 e2d3s04                 2        DA1             106 GiB  ok                
 e2d3s05                 2        DA2             110 GiB  ok                
 e2d3s06                 2        DA3             110 GiB  ok                
 e2d4s01                 2        DA1             106 GiB  ok                
 e2d4s02                 2        DA2             110 GiB  ok                
 e2d4s03                 2        DA3             110 GiB  ok                
 e2d4s04                 2        DA1             106 GiB  ok                
 e2d4s05                 2        DA2             110 GiB  ok                
 e2d4s06                 2        DA3             110 GiB  ok                
 e2d5s01                 2        DA1             108 GiB  ok                
 e2d5s02                 2        DA2             110 GiB  ok                
 e2d5s03                 2        DA3             110 GiB  ok                
 e2d5s04                 2        DA1             108 GiB  ok                
 e2d5s05                 2        DA2             110 GiB  ok                
 e2d5s06                 2        DA3             110 GiB  ok                
 e3d1s01                 2        DA1             108 GiB  ok                
 e3d1s02                 2        DA3             110 GiB  ok                
 e3d1s03log              2        LOG             186 GiB  ok                
 e3d1s04                 2        DA1             106 GiB  ok                
 e3d1s05                 2        DA2             110 GiB  ok                
 e3d1s06                 2        DA3             110 GiB  ok                
 e3d2s01                 2        DA1             106 GiB  ok                
 e3d2s02                 2        DA2             110 GiB  ok                
 e3d2s03                 2        DA3             110 GiB  ok                
 e3d2s04                 2        DA1             108 GiB  ok                
 e3d2s05                 2        DA2             110 GiB  ok                
 e3d2s06                 2        DA3             110 GiB  ok                
 e3d3s01                 2        DA1             106 GiB  ok                
 e3d3s02                 2        DA2             110 GiB  ok                
 e3d3s03                 2        DA3             110 GiB  ok                
 e3d3s04                 2        DA1             106 GiB  ok                
 e3d3s05                 2        DA2             110 GiB  ok                
 e3d3s06                 2        DA3             110 GiB  ok                
 e3d4s01                 2        DA1             106 GiB  ok                
 e3d4s02                 2        DA2             110 GiB  ok                
 e3d4s03                 2        DA3             110 GiB  ok                
 e3d4s04                 2        DA1             108 GiB  ok                
 e3d4s05                 2        DA2             110 GiB  ok                
 e3d4s06                 2        DA3             110 GiB  ok                
 e3d5s01                 2        DA1             108 GiB  ok                
 e3d5s02                 2        DA2             110 GiB  ok                
 e3d5s03                 2        DA3             110 GiB  ok                
 e3d5s04                 2        DA1             106 GiB  ok                
 e3d5s05                 2        DA2             110 GiB  ok                
 e3d5s06                 2        DA3             110 GiB  ok                
 e4d1s01                 2        DA1             106 GiB  ok                
 e4d1s02                 2        DA3             110 GiB  ok                
 e4d1s04                 2        DA1             106 GiB  ok                
 e4d1s05                 2        DA2             110 GiB  ok                
 e4d1s06                 2        DA3             110 GiB  ok                
 e4d2s01                 2        DA1             106 GiB  ok                
 e4d2s02                 2        DA2             110 GiB  ok                
 e4d2s03                 2        DA3             110 GiB  ok                
 e4d2s04                 2        DA1             106 GiB  ok                
 e4d2s05                 2        DA2             110 GiB  ok                
 e4d2s06                 2        DA3             110 GiB  ok                
 e4d3s01                 2        DA1             108 GiB  ok                
 e4d3s02                 2        DA2             110 GiB  ok                
 e4d3s03                 2        DA3             110 GiB  ok                
 e4d3s04                 2        DA1             108 GiB  ok                
 e4d3s05                 2        DA2             110 GiB  ok                
 e4d3s06                 2        DA3             110 GiB  ok                
 e4d4s01                 2        DA1             106 GiB  ok                
 e4d4s02                 2        DA2             110 GiB  ok                
 e4d4s03                 2        DA3             110 GiB  ok                
 e4d4s04                 2        DA1             106 GiB  ok                
 e4d4s05                 2        DA2             110 GiB  ok                
 e4d4s06                 2        DA3             110 GiB  ok                
 e4d5s01                 2        DA1             106 GiB  ok                
 e4d5s02                 2        DA2             110 GiB  ok                
 e4d5s03                 2        DA3             110 GiB  ok                
 e4d5s04                 2        DA1             106 GiB  ok                
 e4d5s05                 2        DA2             110 GiB  ok                
 e4d5s06                 2        DA3             110 GiB  ok                
 e5d1s01                 2        DA1             108 GiB  ok                
 e5d1s02                 2        DA2             110 GiB  ok                
 e5d1s04                 2        DA1             106 GiB  ok                
 e5d1s05                 2        DA2             110 GiB  ok                
 e5d1s06                 2        DA3             110 GiB  ok                
 e5d2s01                 2        DA1             108 GiB  ok                
 e5d2s02                 2        DA2             110 GiB  ok                
 e5d2s03                 2        DA3             110 GiB  ok                
 e5d2s04                 2        DA1             108 GiB  ok                
 e5d2s05                 2        DA2             110 GiB  ok                
 e5d2s06                 2        DA3             110 GiB  ok                
 e5d3s01                 2        DA1             108 GiB  ok                
 e5d3s02                 2        DA2             110 GiB  ok                
 e5d3s03                 2        DA3             110 GiB  ok                
 e5d3s04                 2        DA1             106 GiB  ok                
 e5d3s05                 2        DA2             110 GiB  ok                
 e5d3s06                 2        DA3             110 GiB  ok                
 e5d4s01                 2        DA1             108 GiB  ok                
 e5d4s02                 2        DA2             110 GiB  ok                
 e5d4s03                 2        DA3             110 GiB  ok                
 e5d4s04                 2        DA1             108 GiB  ok                
 e5d4s05                 2        DA2             110 GiB  ok                
 e5d4s06                 2        DA3             110 GiB  ok                
 e5d5s01                 2        DA1             108 GiB  ok                
 e5d5s02                 2        DA2             110 GiB  ok                
 e5d5s03                 2        DA3             110 GiB  ok                
 e5d5s04                 2        DA1             106 GiB  ok                
 e5d5s05                 2        DA2             110 GiB  ok                
 e5d5s06                 2        DA3             110 GiB  ok                
 e6d1s01                 2        DA1             108 GiB  ok                
 e6d1s02                 2        DA2             110 GiB  ok                
 e6d1s04                 2        DA1             108 GiB  ok                
 e6d1s05                 2        DA2             110 GiB  ok                
 e6d1s06                 2        DA3             110 GiB  ok                
 e6d2s01                 2        DA1             106 GiB  ok                
 e6d2s02                 2        DA2             110 GiB  ok                
 e6d2s03                 2        DA3             110 GiB  ok                
 e6d2s04                 2        DA1             108 GiB  ok                
 e6d2s05                 2        DA2             108 GiB  ok                
 e6d2s06                 2        DA3             110 GiB  ok                
 e6d3s01                 2        DA1             106 GiB  ok                
 e6d3s02                 2        DA2             108 GiB  ok                
 e6d3s03                 2        DA3             110 GiB  ok                
 e6d3s04                 2        DA1             106 GiB  ok                
 e6d3s05                 2        DA2             108 GiB  ok                
 e6d3s06                 2        DA3             108 GiB  ok                
 e6d4s01                 2        DA1             106 GiB  ok                
 e6d4s02                 2        DA2             108 GiB  ok                
 e6d4s03                 2        DA3             108 GiB  ok                
 e6d4s04                 2        DA1             108 GiB  ok                
 e6d4s05                 2        DA2             108 GiB  ok                
 e6d4s06                 2        DA3             108 GiB  ok                
 e6d5s01                 2        DA1             108 GiB  ok                
 e6d5s02                 2        DA2             110 GiB  ok                
 e6d5s03                 2        DA3             108 GiB  ok                
 e6d5s04                 2        DA1             108 GiB  ok                
 e6d5s05                 2        DA2             110 GiB  ok                
 e6d5s06                 2        DA3             108 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss02a_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss02a_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss02a_MetaData_8M_3p_1  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss02a_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss02a_MetaData_8M_3p_3  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss02a_Data_8M_3p_1  8+3p                DA3              99 TiB      8 MiB     32 KiB             
 gss02a_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss02a_Data_8M_3p_3  8+3p                DA1              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss02a.ebi.ac.uk                                 gss02a.ebi.ac.uk,gss02b.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss02b                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       39%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       67%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       13%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       13%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s07                 2        DA1             108 GiB  ok                
 e1d1s08                 2        DA2             110 GiB  ok                
 e1d1s09                 2        DA3             110 GiB  ok                
 e1d1s10                 2        DA1             108 GiB  ok                
 e1d1s11                 2        DA2             110 GiB  ok                
 e1d1s12                 2        DA3             110 GiB  ok                
 e1d2s07                 2        DA1             108 GiB  ok                
 e1d2s08                 2        DA2             110 GiB  ok                
 e1d2s09                 2        DA3             110 GiB  ok                
 e1d2s10                 2        DA1             108 GiB  ok                
 e1d2s11                 2        DA2             110 GiB  ok                
 e1d2s12                 2        DA3             110 GiB  ok                
 e1d3s07                 2        DA1             108 GiB  ok                
 e1d3s08                 2        DA2             110 GiB  ok                
 e1d3s09                 2        DA3             110 GiB  ok                
 e1d3s10                 2        DA1             108 GiB  ok                
 e1d3s11                 2        DA2             110 GiB  ok                
 e1d3s12                 2        DA3             110 GiB  ok                
 e1d4s07                 2        DA1             108 GiB  ok                
 e1d4s08                 2        DA2             110 GiB  ok                
 e1d4s09                 2        DA3             110 GiB  ok                
 e1d4s10                 2        DA1             108 GiB  ok                
 e1d4s11                 2        DA2             110 GiB  ok                
 e1d4s12                 2        DA3             110 GiB  ok                
 e1d5s07                 2        DA1             108 GiB  ok                
 e1d5s08                 2        DA2             110 GiB  ok                
 e1d5s09                 2        DA3             110 GiB  ok                
 e1d5s10                 2        DA3             110 GiB  ok                
 e1d5s11                 2        DA2             110 GiB  ok                
 e1d5s12log              2        LOG             186 GiB  ok                
 e2d1s07                 2        DA1             108 GiB  ok                
 e2d1s08                 2        DA2             110 GiB  ok                
 e2d1s09                 2        DA3             110 GiB  ok                
 e2d1s10                 2        DA1             108 GiB  ok                
 e2d1s11                 2        DA2             110 GiB  ok                
 e2d1s12                 2        DA3             110 GiB  ok                
 e2d2s07                 2        DA1             108 GiB  ok                
 e2d2s08                 2        DA2             110 GiB  ok                
 e2d2s09                 2        DA3             110 GiB  ok                
 e2d2s10                 2        DA1             108 GiB  ok                
 e2d2s11                 2        DA2             110 GiB  ok                
 e2d2s12                 2        DA3             110 GiB  ok                
 e2d3s07                 2        DA1             108 GiB  ok                
 e2d3s08                 2        DA2             110 GiB  ok                
 e2d3s09                 2        DA3             110 GiB  ok                
 e2d3s10                 2        DA1             108 GiB  ok                
 e2d3s11                 2        DA2             110 GiB  ok                
 e2d3s12                 2        DA3             110 GiB  ok                
 e2d4s07                 2        DA1             108 GiB  ok                
 e2d4s08                 2        DA2             110 GiB  ok                
 e2d4s09                 2        DA3             110 GiB  ok                
 e2d4s10                 2        DA1             108 GiB  ok                
 e2d4s11                 2        DA2             110 GiB  ok                
 e2d4s12                 2        DA3             110 GiB  ok                
 e2d5s07                 2        DA1             108 GiB  ok                
 e2d5s08                 2        DA2             110 GiB  ok                
 e2d5s09                 2        DA3             110 GiB  ok                
 e2d5s10                 2        DA3             110 GiB  ok                
 e2d5s11                 2        DA2             110 GiB  ok                
 e2d5s12log              2        LOG             186 GiB  ok                
 e3d1s07                 2        DA1             108 GiB  ok                
 e3d1s08                 2        DA2             110 GiB  ok                
 e3d1s09                 2        DA3             110 GiB  ok                
 e3d1s10                 2        DA1             108 GiB  ok                
 e3d1s11                 2        DA2             110 GiB  ok                
 e3d1s12                 2        DA3             110 GiB  ok                
 e3d2s07                 2        DA1             108 GiB  ok                
 e3d2s08                 2        DA2             110 GiB  ok                
 e3d2s09                 2        DA3             110 GiB  ok                
 e3d2s10                 2        DA1             108 GiB  ok                
 e3d2s11                 2        DA2             110 GiB  ok                
 e3d2s12                 2        DA3             110 GiB  ok                
 e3d3s07                 2        DA1             108 GiB  ok                
 e3d3s08                 2        DA2             110 GiB  ok                
 e3d3s09                 2        DA3             110 GiB  ok                
 e3d3s10                 2        DA1             108 GiB  ok                
 e3d3s11                 2        DA2             110 GiB  ok                
 e3d3s12                 2        DA3             110 GiB  ok                
 e3d4s07                 2        DA1             108 GiB  ok                
 e3d4s08                 2        DA2             110 GiB  ok                
 e3d4s09                 2        DA3             110 GiB  ok                
 e3d4s10                 2        DA1             108 GiB  ok                
 e3d4s11                 2        DA2             110 GiB  ok                
 e3d4s12                 2        DA3             110 GiB  ok                
 e3d5s07                 2        DA1             108 GiB  ok                
 e3d5s08                 2        DA2             110 GiB  ok                
 e3d5s09                 2        DA3             110 GiB  ok                
 e3d5s10                 2        DA1             108 GiB  ok                
 e3d5s11                 2        DA3             110 GiB  ok                
 e3d5s12log              2        LOG             186 GiB  ok                
 e4d1s07                 2        DA1             108 GiB  ok                
 e4d1s08                 2        DA2             110 GiB  ok                
 e4d1s09                 2        DA3             110 GiB  ok                
 e4d1s10                 2        DA1             108 GiB  ok                
 e4d1s11                 2        DA2             110 GiB  ok                
 e4d1s12                 2        DA3             110 GiB  ok                
 e4d2s07                 2        DA1             106 GiB  ok                
 e4d2s08                 2        DA2             110 GiB  ok                
 e4d2s09                 2        DA3             110 GiB  ok                
 e4d2s10                 2        DA1             106 GiB  ok                
 e4d2s11                 2        DA2             110 GiB  ok                
 e4d2s12                 2        DA3             110 GiB  ok                
 e4d3s07                 2        DA1             106 GiB  ok                
 e4d3s08                 2        DA2             110 GiB  ok                
 e4d3s09                 2        DA3             110 GiB  ok                
 e4d3s10                 2        DA1             106 GiB  ok                
 e4d3s11                 2        DA2             110 GiB  ok                
 e4d3s12                 2        DA3             110 GiB  ok                
 e4d4s07                 2        DA1             106 GiB  ok                
 e4d4s08                 2        DA2             110 GiB  ok                
 e4d4s09                 2        DA3             110 GiB  ok                
 e4d4s10                 2        DA1             108 GiB  ok                
 e4d4s11                 2        DA2             110 GiB  ok                
 e4d4s12                 2        DA3             110 GiB  ok                
 e4d5s07                 2        DA1             106 GiB  ok                
 e4d5s08                 2        DA2             110 GiB  ok                
 e4d5s09                 2        DA3             110 GiB  ok                
 e4d5s10                 2        DA1             106 GiB  ok                
 e4d5s11                 2        DA3             110 GiB  ok                
 e5d1s07                 2        DA1             106 GiB  ok                
 e5d1s08                 2        DA2             110 GiB  ok                
 e5d1s09                 2        DA3             110 GiB  ok                
 e5d1s10                 2        DA1             106 GiB  ok                
 e5d1s11                 2        DA2             110 GiB  ok                
 e5d1s12                 2        DA3             110 GiB  ok                
 e5d2s07                 2        DA1             106 GiB  ok                
 e5d2s08                 2        DA2             110 GiB  ok                
 e5d2s09                 2        DA3             110 GiB  ok                
 e5d2s10                 2        DA1             106 GiB  ok                
 e5d2s11                 2        DA2             110 GiB  ok                
 e5d2s12                 2        DA3             110 GiB  ok                
 e5d3s07                 2        DA1             106 GiB  ok                
 e5d3s08                 2        DA2             110 GiB  ok                
 e5d3s09                 2        DA3             110 GiB  ok                
 e5d3s10                 2        DA1             106 GiB  ok                
 e5d3s11                 2        DA2             110 GiB  ok                
 e5d3s12                 2        DA3             110 GiB  ok                
 e5d4s07                 2        DA1             106 GiB  ok                
 e5d4s08                 2        DA2             110 GiB  ok                
 e5d4s09                 2        DA3             110 GiB  ok                
 e5d4s10                 2        DA1             106 GiB  ok                
 e5d4s11                 2        DA2             110 GiB  ok                
 e5d4s12                 2        DA3             110 GiB  ok                
 e5d5s07                 2        DA1             106 GiB  ok                
 e5d5s08                 2        DA2             110 GiB  ok                
 e5d5s09                 2        DA3             110 GiB  ok                
 e5d5s10                 2        DA1             106 GiB  ok                
 e5d5s11                 2        DA2             110 GiB  ok                
 e6d1s07                 2        DA1             106 GiB  ok                
 e6d1s08                 2        DA2             110 GiB  ok                
 e6d1s09                 2        DA3             110 GiB  ok                
 e6d1s10                 2        DA1             106 GiB  ok                
 e6d1s11                 2        DA2             110 GiB  ok                
 e6d1s12                 2        DA3             110 GiB  ok                
 e6d2s07                 2        DA1             106 GiB  ok                
 e6d2s08                 2        DA2             110 GiB  ok                
 e6d2s09                 2        DA3             108 GiB  ok                
 e6d2s10                 2        DA1             106 GiB  ok                
 e6d2s11                 2        DA2             108 GiB  ok                
 e6d2s12                 2        DA3             108 GiB  ok                
 e6d3s07                 2        DA1             106 GiB  ok                
 e6d3s08                 2        DA2             108 GiB  ok                
 e6d3s09                 2        DA3             108 GiB  ok                
 e6d3s10                 2        DA1             106 GiB  ok                
 e6d3s11                 2        DA2             108 GiB  ok                
 e6d3s12                 2        DA3             108 GiB  ok                
 e6d4s07                 2        DA1             106 GiB  ok                
 e6d4s08                 2        DA2             108 GiB  ok                
 e6d4s09                 2        DA3             108 GiB  ok                
 e6d4s10                 2        DA1             106 GiB  ok                
 e6d4s11                 2        DA2             108 GiB  ok                
 e6d4s12                 2        DA3             110 GiB  ok                
 e6d5s07                 2        DA1             106 GiB  ok                
 e6d5s08                 2        DA2             110 GiB  ok                
 e6d5s09                 2        DA3             110 GiB  ok                
 e6d5s10                 2        DA1             106 GiB  ok                
 e6d5s11                 2        DA2             110 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss02b_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss02b_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss02b_MetaData_8M_3p_1  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss02b_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss02b_MetaData_8M_3p_3  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss02b_Data_8M_3p_1  8+3p                DA1              99 TiB      8 MiB     32 KiB             
 gss02b_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss02b_Data_8M_3p_3  8+3p                DA3              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss02b.ebi.ac.uk                                 gss02b.ebi.ac.uk,gss02a.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss03a                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       36%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       18%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       19%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub        4%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s01                 2        DA3             110 GiB  ok                
 e1d1s02                 2        DA2             110 GiB  ok                
 e1d1s03log              2        LOG             186 GiB  ok                
 e1d1s04                 2        DA1             108 GiB  ok                
 e1d1s05                 2        DA2             110 GiB  ok                
 e1d1s06                 2        DA3             110 GiB  ok                
 e1d2s01                 2        DA1             108 GiB  ok                
 e1d2s02                 2        DA2             110 GiB  ok                
 e1d2s03                 2        DA3             110 GiB  ok                
 e1d2s04                 2        DA1             108 GiB  ok                
 e1d2s05                 2        DA2             110 GiB  ok                
 e1d2s06                 2        DA3             110 GiB  ok                
 e1d3s01                 2        DA1             108 GiB  ok                
 e1d3s02                 2        DA2             110 GiB  ok                
 e1d3s03                 2        DA3             110 GiB  ok                
 e1d3s04                 2        DA1             108 GiB  ok                
 e1d3s05                 2        DA2             110 GiB  ok                
 e1d3s06                 2        DA3             110 GiB  ok                
 e1d4s01                 2        DA1             108 GiB  ok                
 e1d4s02                 2        DA2             110 GiB  ok                
 e1d4s03                 2        DA3             110 GiB  ok                
 e1d4s04                 2        DA1             108 GiB  ok                
 e1d4s05                 2        DA2             110 GiB  ok                
 e1d4s06                 2        DA3             110 GiB  ok                
 e1d5s01                 2        DA1             108 GiB  ok                
 e1d5s02                 2        DA2             110 GiB  ok                
 e1d5s03                 2        DA3             110 GiB  ok                
 e1d5s04                 2        DA1             108 GiB  ok                
 e1d5s05                 2        DA2             110 GiB  ok                
 e1d5s06                 2        DA3             110 GiB  ok                
 e2d1s01                 2        DA3             110 GiB  ok                
 e2d1s02                 2        DA2             110 GiB  ok                
 e2d1s03log              2        LOG             186 GiB  ok                
 e2d1s04                 2        DA1             108 GiB  ok                
 e2d1s05                 2        DA2             110 GiB  ok                
 e2d1s06                 2        DA3             110 GiB  ok                
 e2d2s01                 2        DA1             108 GiB  ok                
 e2d2s02                 2        DA2             110 GiB  ok                
 e2d2s03                 2        DA3             110 GiB  ok                
 e2d2s04                 2        DA1             108 GiB  ok                
 e2d2s05                 2        DA2             110 GiB  ok                
 e2d2s06                 2        DA3             110 GiB  ok                
 e2d3s01                 2        DA1             108 GiB  ok                
 e2d3s02                 2        DA2             110 GiB  ok                
 e2d3s03                 2        DA3             110 GiB  ok                
 e2d3s04                 2        DA1             108 GiB  ok                
 e2d3s05                 2        DA2             110 GiB  ok                
 e2d3s06                 2        DA3             110 GiB  ok                
 e2d4s01                 2        DA1             108 GiB  ok                
 e2d4s02                 2        DA2             110 GiB  ok                
 e2d4s03                 2        DA3             110 GiB  ok                
 e2d4s04                 2        DA1             108 GiB  ok                
 e2d4s05                 2        DA2             110 GiB  ok                
 e2d4s06                 2        DA3             110 GiB  ok                
 e2d5s01                 2        DA1             108 GiB  ok                
 e2d5s02                 2        DA2             110 GiB  ok                
 e2d5s03                 2        DA3             110 GiB  ok                
 e2d5s04                 2        DA1             108 GiB  ok                
 e2d5s05                 2        DA2             110 GiB  ok                
 e2d5s06                 2        DA3             110 GiB  ok                
 e3d1s01                 2        DA1             108 GiB  ok                
 e3d1s02                 2        DA3             110 GiB  ok                
 e3d1s03log              2        LOG             186 GiB  ok                
 e3d1s04                 2        DA1             108 GiB  ok                
 e3d1s05                 2        DA2             110 GiB  ok                
 e3d1s06                 2        DA3             110 GiB  ok                
 e3d2s01                 2        DA1             108 GiB  ok                
 e3d2s02                 2        DA2             110 GiB  ok                
 e3d2s03                 2        DA3             110 GiB  ok                
 e3d2s04                 2        DA1             108 GiB  ok                
 e3d2s05                 2        DA2             110 GiB  ok                
 e3d2s06                 2        DA3             110 GiB  ok                
 e3d3s01                 2        DA1             108 GiB  ok                
 e3d3s02                 2        DA2             110 GiB  ok                
 e3d3s03                 2        DA3             110 GiB  ok                
 e3d3s04                 2        DA1             108 GiB  ok                
 e3d3s05                 2        DA2             110 GiB  ok                
 e3d3s06                 2        DA3             110 GiB  ok                
 e3d4s01                 2        DA1             108 GiB  ok                
 e3d4s02                 2        DA2             110 GiB  ok                
 e3d4s03                 2        DA3             110 GiB  ok                
 e3d4s04                 2        DA1             108 GiB  ok                
 e3d4s05                 2        DA2             110 GiB  ok                
 e3d4s06                 2        DA3             110 GiB  ok                
 e3d5s01                 2        DA1             108 GiB  ok                
 e3d5s02                 2        DA2             110 GiB  ok                
 e3d5s03                 2        DA3             110 GiB  ok                
 e3d5s04                 2        DA1             108 GiB  ok                
 e3d5s05                 2        DA2             110 GiB  ok                
 e3d5s06                 2        DA3             110 GiB  ok                
 e4d1s01                 2        DA1             108 GiB  ok                
 e4d1s02                 2        DA3             110 GiB  ok                
 e4d1s04                 2        DA1             108 GiB  ok                
 e4d1s05                 2        DA2             110 GiB  ok                
 e4d1s06                 2        DA3             110 GiB  ok                
 e4d2s01                 2        DA1             108 GiB  ok                
 e4d2s02                 2        DA2             110 GiB  ok                
 e4d2s03                 2        DA3             110 GiB  ok                
 e4d2s04                 2        DA1             106 GiB  ok                
 e4d2s05                 2        DA2             110 GiB  ok                
 e4d2s06                 2        DA3             110 GiB  ok                
 e4d3s01                 2        DA1             106 GiB  ok                
 e4d3s02                 2        DA2             110 GiB  ok                
 e4d3s03                 2        DA3             110 GiB  ok                
 e4d3s04                 2        DA1             106 GiB  ok                
 e4d3s05                 2        DA2             110 GiB  ok                
 e4d3s06                 2        DA3             110 GiB  ok                
 e4d4s01                 2        DA1             106 GiB  ok                
 e4d4s02                 2        DA2             110 GiB  ok                
 e4d4s03                 2        DA3             110 GiB  ok                
 e4d4s04                 2        DA1             106 GiB  ok                
 e4d4s05                 2        DA2             110 GiB  ok                
 e4d4s06                 2        DA3             110 GiB  ok                
 e4d5s01                 2        DA1             106 GiB  ok                
 e4d5s02                 2        DA2             110 GiB  ok                
 e4d5s03                 2        DA3             110 GiB  ok                
 e4d5s04                 2        DA1             106 GiB  ok                
 e4d5s05                 2        DA2             110 GiB  ok                
 e4d5s06                 2        DA3             110 GiB  ok                
 e5d1s01                 2        DA1             106 GiB  ok                
 e5d1s02                 2        DA2             110 GiB  ok                
 e5d1s04                 2        DA1             106 GiB  ok                
 e5d1s05                 2        DA2             110 GiB  ok                
 e5d1s06                 2        DA3             110 GiB  ok                
 e5d2s01                 2        DA1             106 GiB  ok                
 e5d2s02                 2        DA2             110 GiB  ok                
 e5d2s03                 2        DA3             110 GiB  ok                
 e5d2s04                 2        DA1             106 GiB  ok                
 e5d2s05                 2        DA2             110 GiB  ok                
 e5d2s06                 2        DA3             110 GiB  ok                
 e5d3s01                 2        DA1             106 GiB  ok                
 e5d3s02                 2        DA2             110 GiB  ok                
 e5d3s03                 2        DA3             110 GiB  ok                
 e5d3s04                 2        DA1             106 GiB  ok                
 e5d3s05                 2        DA2             110 GiB  ok                
 e5d3s06                 2        DA3             110 GiB  ok                
 e5d4s01                 2        DA1             106 GiB  ok                
 e5d4s02                 2        DA2             110 GiB  ok                
 e5d4s03                 2        DA3             110 GiB  ok                
 e5d4s04                 2        DA1             106 GiB  ok                
 e5d4s05                 2        DA2             110 GiB  ok                
 e5d4s06                 2        DA3             110 GiB  ok                
 e5d5s01                 2        DA1             106 GiB  ok                
 e5d5s02                 2        DA2             110 GiB  ok                
 e5d5s03                 2        DA3             110 GiB  ok                
 e5d5s04                 2        DA1             106 GiB  ok                
 e5d5s05                 2        DA2             110 GiB  ok                
 e5d5s06                 2        DA3             110 GiB  ok                
 e6d1s01                 2        DA1             106 GiB  ok                
 e6d1s02                 2        DA2             110 GiB  ok                
 e6d1s04                 2        DA1             106 GiB  ok                
 e6d1s05                 2        DA2             110 GiB  ok                
 e6d1s06                 2        DA3             110 GiB  ok                
 e6d2s01                 2        DA1             106 GiB  ok                
 e6d2s02                 2        DA2             110 GiB  ok                
 e6d2s03                 2        DA3             110 GiB  ok                
 e6d2s04                 2        DA1             106 GiB  ok                
 e6d2s05                 2        DA2             108 GiB  ok                
 e6d2s06                 2        DA3             108 GiB  ok                
 e6d3s01                 2        DA1             106 GiB  ok                
 e6d3s02                 2        DA2             108 GiB  ok                
 e6d3s03                 2        DA3             108 GiB  ok                
 e6d3s04                 2        DA1             106 GiB  ok                
 e6d3s05                 2        DA2             108 GiB  ok                
 e6d3s06                 2        DA3             108 GiB  ok                
 e6d4s01                 2        DA1             106 GiB  ok                
 e6d4s02                 2        DA2             108 GiB  ok                
 e6d4s03                 2        DA3             108 GiB  ok                
 e6d4s04                 2        DA1             106 GiB  ok                
 e6d4s05                 2        DA2             108 GiB  ok                
 e6d4s06                 2        DA3             108 GiB  ok                
 e6d5s01                 2        DA1             106 GiB  ok                
 e6d5s02                 2        DA2             110 GiB  ok                
 e6d5s03                 2        DA3             110 GiB  ok                
 e6d5s04                 2        DA1             106 GiB  ok                
 e6d5s05                 2        DA2             110 GiB  ok                
 e6d5s06                 2        DA3             110 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss03a_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss03a_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss03a_MetaData_8M_3p_1  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss03a_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss03a_MetaData_8M_3p_3  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss03a_Data_8M_3p_1  8+3p                DA3              99 TiB      8 MiB     32 KiB             
 gss03a_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss03a_Data_8M_3p_3  8+3p                DA1              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss03a.ebi.ac.uk                                 gss03a.ebi.ac.uk,gss03b.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss03b                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       38%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       12%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       20%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       19%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s07                 2        DA1             108 GiB  ok                
 e1d1s08                 2        DA2             110 GiB  ok                
 e1d1s09                 2        DA3             110 GiB  ok                
 e1d1s10                 2        DA1             108 GiB  ok                
 e1d1s11                 2        DA2             110 GiB  ok                
 e1d1s12                 2        DA3             110 GiB  ok                
 e1d2s07                 2        DA1             108 GiB  ok                
 e1d2s08                 2        DA2             110 GiB  ok                
 e1d2s09                 2        DA3             110 GiB  ok                
 e1d2s10                 2        DA1             108 GiB  ok                
 e1d2s11                 2        DA2             110 GiB  ok                
 e1d2s12                 2        DA3             110 GiB  ok                
 e1d3s07                 2        DA1             108 GiB  ok                
 e1d3s08                 2        DA2             110 GiB  ok                
 e1d3s09                 2        DA3             110 GiB  ok                
 e1d3s10                 2        DA1             108 GiB  ok                
 e1d3s11                 2        DA2             110 GiB  ok                
 e1d3s12                 2        DA3             110 GiB  ok                
 e1d4s07                 2        DA1             108 GiB  ok                
 e1d4s08                 2        DA2             110 GiB  ok                
 e1d4s09                 2        DA3             110 GiB  ok                
 e1d4s10                 2        DA1             108 GiB  ok                
 e1d4s11                 2        DA2             110 GiB  ok                
 e1d4s12                 2        DA3             110 GiB  ok                
 e1d5s07                 2        DA1             108 GiB  ok                
 e1d5s08                 2        DA2             110 GiB  ok                
 e1d5s09                 2        DA3             110 GiB  ok                
 e1d5s10                 2        DA3             110 GiB  ok                
 e1d5s11                 2        DA2             110 GiB  ok                
 e1d5s12log              2        LOG             186 GiB  ok                
 e2d1s07                 2        DA1             108 GiB  ok                
 e2d1s08                 2        DA2             110 GiB  ok                
 e2d1s09                 2        DA3             110 GiB  ok                
 e2d1s10                 2        DA1             106 GiB  ok                
 e2d1s11                 2        DA2             110 GiB  ok                
 e2d1s12                 2        DA3             110 GiB  ok                
 e2d2s07                 2        DA1             106 GiB  ok                
 e2d2s08                 2        DA2             110 GiB  ok                
 e2d2s09                 2        DA3             110 GiB  ok                
 e2d2s10                 2        DA1             106 GiB  ok                
 e2d2s11                 2        DA2             110 GiB  ok                
 e2d2s12                 2        DA3             110 GiB  ok                
 e2d3s07                 2        DA1             106 GiB  ok                
 e2d3s08                 2        DA2             110 GiB  ok                
 e2d3s09                 2        DA3             110 GiB  ok                
 e2d3s10                 2        DA1             106 GiB  ok                
 e2d3s11                 2        DA2             110 GiB  ok                
 e2d3s12                 2        DA3             110 GiB  ok                
 e2d4s07                 2        DA1             106 GiB  ok                
 e2d4s08                 2        DA2             110 GiB  ok                
 e2d4s09                 2        DA3             110 GiB  ok                
 e2d4s10                 2        DA1             108 GiB  ok                
 e2d4s11                 2        DA2             110 GiB  ok                
 e2d4s12                 2        DA3             110 GiB  ok                
 e2d5s07                 2        DA1             108 GiB  ok                
 e2d5s08                 2        DA2             110 GiB  ok                
 e2d5s09                 2        DA3             110 GiB  ok                
 e2d5s10                 2        DA3             110 GiB  ok                
 e2d5s11                 2        DA2             110 GiB  ok                
 e2d5s12log              2        LOG             186 GiB  ok                
 e3d1s07                 2        DA1             108 GiB  ok                
 e3d1s08                 2        DA2             110 GiB  ok                
 e3d1s09                 2        DA3             110 GiB  ok                
 e3d1s10                 2        DA1             106 GiB  ok                
 e3d1s11                 2        DA2             110 GiB  ok                
 e3d1s12                 2        DA3             110 GiB  ok                
 e3d2s07                 2        DA1             106 GiB  ok                
 e3d2s08                 2        DA2             110 GiB  ok                
 e3d2s09                 2        DA3             110 GiB  ok                
 e3d2s10                 2        DA1             108 GiB  ok                
 e3d2s11                 2        DA2             110 GiB  ok                
 e3d2s12                 2        DA3             110 GiB  ok                
 e3d3s07                 2        DA1             106 GiB  ok                
 e3d3s08                 2        DA2             110 GiB  ok                
 e3d3s09                 2        DA3             110 GiB  ok                
 e3d3s10                 2        DA1             106 GiB  ok                
 e3d3s11                 2        DA2             110 GiB  ok                
 e3d3s12                 2        DA3             110 GiB  ok                
 e3d4s07                 2        DA1             106 GiB  ok                
 e3d4s08                 2        DA2             110 GiB  ok                
 e3d4s09                 2        DA3             110 GiB  ok                
 e3d4s10                 2        DA1             108 GiB  ok                
 e3d4s11                 2        DA2             110 GiB  ok                
 e3d4s12                 2        DA3             110 GiB  ok                
 e3d5s07                 2        DA1             108 GiB  ok                
 e3d5s08                 2        DA2             110 GiB  ok                
 e3d5s09                 2        DA3             110 GiB  ok                
 e3d5s10                 2        DA1             106 GiB  ok                
 e3d5s11                 2        DA3             110 GiB  ok                
 e3d5s12log              2        LOG             186 GiB  ok                
 e4d1s07                 2        DA1             106 GiB  ok                
 e4d1s08                 2        DA2             110 GiB  ok                
 e4d1s09                 2        DA3             110 GiB  ok                
 e4d1s10                 2        DA1             106 GiB  ok                
 e4d1s11                 2        DA2             110 GiB  ok                
 e4d1s12                 2        DA3             110 GiB  ok                
 e4d2s07                 2        DA1             106 GiB  ok                
 e4d2s08                 2        DA2             110 GiB  ok                
 e4d2s09                 2        DA3             110 GiB  ok                
 e4d2s10                 2        DA1             106 GiB  ok                
 e4d2s11                 2        DA2             110 GiB  ok                
 e4d2s12                 2        DA3             110 GiB  ok                
 e4d3s07                 2        DA1             108 GiB  ok                
 e4d3s08                 2        DA2             110 GiB  ok                
 e4d3s09                 2        DA3             110 GiB  ok                
 e4d3s10                 2        DA1             108 GiB  ok                
 e4d3s11                 2        DA2             110 GiB  ok                
 e4d3s12                 2        DA3             110 GiB  ok                
 e4d4s07                 2        DA1             106 GiB  ok                
 e4d4s08                 2        DA2             110 GiB  ok                
 e4d4s09                 2        DA3             110 GiB  ok                
 e4d4s10                 2        DA1             106 GiB  ok                
 e4d4s11                 2        DA2             110 GiB  ok                
 e4d4s12                 2        DA3             110 GiB  ok                
 e4d5s07                 2        DA1             106 GiB  ok                
 e4d5s08                 2        DA2             110 GiB  ok                
 e4d5s09                 2        DA3             110 GiB  ok                
 e4d5s10                 2        DA1             106 GiB  ok                
 e4d5s11                 2        DA3             110 GiB  ok                
 e5d1s07                 2        DA1             108 GiB  ok                
 e5d1s08                 2        DA2             110 GiB  ok                
 e5d1s09                 2        DA3             110 GiB  ok                
 e5d1s10                 2        DA1             106 GiB  ok                
 e5d1s11                 2        DA2             110 GiB  ok                
 e5d1s12                 2        DA3             110 GiB  ok                
 e5d2s07                 2        DA1             108 GiB  ok                
 e5d2s08                 2        DA2             110 GiB  ok                
 e5d2s09                 2        DA3             110 GiB  ok                
 e5d2s10                 2        DA1             108 GiB  ok                
 e5d2s11                 2        DA2             110 GiB  ok                
 e5d2s12                 2        DA3             110 GiB  ok                
 e5d3s07                 2        DA1             108 GiB  ok                
 e5d3s08                 2        DA2             110 GiB  ok                
 e5d3s09                 2        DA3             110 GiB  ok                
 e5d3s10                 2        DA1             106 GiB  ok                
 e5d3s11                 2        DA2             110 GiB  ok                
 e5d3s12                 2        DA3             110 GiB  ok                
 e5d4s07                 2        DA1             108 GiB  ok                
 e5d4s08                 2        DA2             110 GiB  ok                
 e5d4s09                 2        DA3             110 GiB  ok                
 e5d4s10                 2        DA1             108 GiB  ok                
 e5d4s11                 2        DA2             110 GiB  ok                
 e5d4s12                 2        DA3             110 GiB  ok                
 e5d5s07                 2        DA1             108 GiB  ok                
 e5d5s08                 2        DA2             110 GiB  ok                
 e5d5s09                 2        DA3             110 GiB  ok                
 e5d5s10                 2        DA1             106 GiB  ok                
 e5d5s11                 2        DA2             110 GiB  ok                
 e6d1s07                 2        DA1             108 GiB  ok                
 e6d1s08                 2        DA2             110 GiB  ok                
 e6d1s09                 2        DA3             110 GiB  ok                
 e6d1s10                 2        DA1             108 GiB  ok                
 e6d1s11                 2        DA2             110 GiB  ok                
 e6d1s12                 2        DA3             110 GiB  ok                
 e6d2s07                 2        DA1             106 GiB  ok                
 e6d2s08                 2        DA2             110 GiB  ok                
 e6d2s09                 2        DA3             108 GiB  ok                
 e6d2s10                 2        DA1             108 GiB  ok                
 e6d2s11                 2        DA2             108 GiB  ok                
 e6d2s12                 2        DA3             108 GiB  ok                
 e6d3s07                 2        DA1             106 GiB  ok                
 e6d3s08                 2        DA2             108 GiB  ok                
 e6d3s09                 2        DA3             108 GiB  ok                
 e6d3s10                 2        DA1             106 GiB  ok                
 e6d3s11                 2        DA2             108 GiB  ok                
 e6d3s12                 2        DA3             108 GiB  ok                
 e6d4s07                 2        DA1             106 GiB  ok                
 e6d4s08                 2        DA2             108 GiB  ok                
 e6d4s09                 2        DA3             108 GiB  ok                
 e6d4s10                 2        DA1             108 GiB  ok                
 e6d4s11                 2        DA2             108 GiB  ok                
 e6d4s12                 2        DA3             110 GiB  ok                
 e6d5s07                 2        DA1             108 GiB  ok                
 e6d5s08                 2        DA2             110 GiB  ok                
 e6d5s09                 2        DA3             110 GiB  ok                
 e6d5s10                 2        DA1             108 GiB  ok                
 e6d5s11                 2        DA2             110 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss03b_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss03b_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss03b_MetaData_8M_3p_1  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss03b_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss03b_MetaData_8M_3p_3  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss03b_Data_8M_3p_1  8+3p                DA1              99 TiB      8 MiB     32 KiB             
 gss03b_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss03b_Data_8M_3p_3  8+3p                DA3              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss03b.ebi.ac.uk                                 gss03b.ebi.ac.uk,gss03a.ebi.ac.uk

-------------- next part --------------

=== mmdiag: config ===
   allowDeleteAclOnChmod 1
   assertOnStructureError 0
   atimeDeferredSeconds 86400
 ! cipherList AUTHONLY
 ! clusterId 17987981184946329605
 ! clusterName GSS.ebi.ac.uk
   consoleLogEvents 0
   dataStructureDump 1 /tmp/mmfs
   dataStructureDumpOnRGOpenFailed 0 /tmp/mmfs
   dataStructureDumpOnSGPanic 0 /tmp/mmfs
   dataStructureDumpWait 60
   dbBlockSizeThreshold -1
   distributedTokenServer 1
   dmapiAllowMountOnWindows 1
   dmapiDataEventRetry 2
   dmapiEnable 1
   dmapiEventBuffers 64
   dmapiEventTimeout -1
 ! dmapiFileHandleSize 32
   dmapiMountEvent all
   dmapiMountTimeout 60
   dmapiSessionFailureTimeout 0
   dmapiWorkerThreads 12
   enableIPv6 0
   enableLowspaceEvents 0
   enableNFSCluster 0
   enableStatUIDremap 0
   enableTreeBasedQuotas 0
   enableUIDremap 0
   encryptionCryptoEngineLibName (NULL)
   encryptionCryptoEngineType CLiC
   enforceFilesetQuotaOnRoot 0
   envVar 
 ! failureDetectionTime 60
   fgdlActivityTimeWindow 10
   fgdlLeaveThreshold 1000
   fineGrainDirLocks 1
   FIPS1402mode 0
   FleaDisableIntegrityChecks 0
   FleaNumAsyncIOThreads 2
   FleaNumLEBBuffers 256
   FleaPreferredStripSize 0
 ! flushedDataTarget 1024
 ! flushedInodeTarget 1024
   healthCheckInterval 10
   idleSocketTimeout 3600
   ignorePrefetchLUNCount 0
   ignoreReplicaSpaceOnStat 0
   ignoreReplicationForQuota 0
   ignoreReplicationOnStatfs 0
 ! ioHistorySize 65536
   iscanPrefetchAggressiveness 2
   leaseDMSTimeout -1
   leaseDuration -1
   leaseRecoveryWait 35
 ! logBufferCount 20
 ! logWrapAmountPct 2
 ! logWrapThreads 128
   lrocChecksum 0
   lrocData 1
   lrocDataMaxBufferSize 32768
   lrocDataMaxFileSize 32768
   lrocDataStubFileSize 0
   lrocDeviceMaxSectorsKB 64
   lrocDeviceNrRequests 1024
   lrocDeviceQueueDepth 31
   lrocDevices 
   lrocDeviceScheduler deadline
   lrocDeviceSetParams 1
   lrocDirectories 1
   lrocInodes 1
 ! maxAllocRegionsPerNode 32
 ! maxBackgroundDeletionThreads 16
 ! maxblocksize 16777216
 ! maxBufferCleaners 1024
 ! maxBufferDescs 2097152
   maxDiskAddrBuffs -1
   maxFcntlRangesPerFile 200
 ! maxFileCleaners 1024
   maxFileNameBytes 255
 ! maxFilesToCache 12288
 ! maxGeneralThreads 1280
 ! maxInodeDeallocPrefetch 128
 ! maxMBpS 16000
   maxMissedPingTimeout 60
 ! maxReceiverThreads 128
 ! maxStatCache 512
   maxTokenServers 128
   minMissedPingTimeout 3
   minQuorumNodes 1
 ! minReleaseLevel 1340
 ! myNodeConfigNumber 5
   noSpaceEventInterval 120
   nsdBufSpace (% of PagePool)  30
 ! nsdClientCksumTypeLocal NsdCksum_Ck64
 ! nsdClientCksumTypeRemote NsdCksum_Ck64
   nsdDumpBuffersOnCksumError 0 nsd_cksum_capture
 ! nsdInlineWriteMax 32768
 ! nsdMaxWorkerThreads 3072
 ! nsdMinWorkerThreads 3072
   nsdMultiQueue 256
   nsdRAIDAllowTraditionalNSD 0
   nsdRAIDAULogColocationLimit 131072
   nsdRAIDBackgroundMinPct 5
 ! nsdRAIDBlockDeviceMaxSectorsKB 4096
 ! nsdRAIDBlockDeviceNrRequests 32
 ! nsdRAIDBlockDeviceQueueDepth 16
 ! nsdRAIDBlockDeviceScheduler deadline
 ! nsdRAIDBufferPoolSizePct (% of PagePool) 80
   nsdRAIDBuffersPromotionThresholdPct 50
   nsdRAIDCreateVdiskThreads 8
   nsdRAIDDiskDiscoveryInterval 180
 ! nsdRAIDEventLogToConsole all
 ! nsdRAIDFastWriteFSDataLimit 65536
 ! nsdRAIDFastWriteFSMetadataLimit 262144
 ! nsdRAIDFlusherBuffersLimitPct 80
 ! nsdRAIDFlusherBuffersLowWatermarkPct 20
 ! nsdRAIDFlusherFWLogHighWatermarkMB 1000
 ! nsdRAIDFlusherFWLogLimitMB 5000
 ! nsdRAIDFlusherThreadsHighWatermark 512
 ! nsdRAIDFlusherThreadsLowWatermark 1
 ! nsdRAIDFlusherTracksLimitPct 80
 ! nsdRAIDFlusherTracksLowWatermarkPct 20
   nsdRAIDForegroundMinPct 15
 ! nsdRAIDMaxTransientStale2FT 1
 ! nsdRAIDMaxTransientStale3FT 1
   nsdRAIDMediumWriteLimitPct 50
   nsdRAIDMultiQueue -1
 ! nsdRAIDReconstructAggressiveness 1
 ! nsdRAIDSmallBufferSize 262144
 ! nsdRAIDSmallThreadRatio 2
 ! nsdRAIDThreadsPerQueue 16
 ! nsdRAIDTracks 131072
 ! numaMemoryInterleave yes
   opensslLibName /usr/lib64/libssl.so.10:/usr/lib64/libssl.so.6:/usr/lib64/libssl.so.0.9.8:/lib64/libssl.so.6:libssl.so:libssl.so.0:libssl.so.4
 ! pagepool 40802189312
   pagepoolMaxPhysMemPct 75
   prefetchAggressiveness 2
   prefetchAggressivenessRead -1
   prefetchAggressivenessWrite -1
 ! prefetchPct 5
   prefetchThreads 72
   readReplicaPolicy default
   remoteMountTimeout 10
   sharedMemLimit 0
   sharedMemReservePct 15
   sidAutoMapRangeLength 15000000
   sidAutoMapRangeStart 15000000
 ! socketMaxListenConnections 1500
   socketRcvBufferSize 0
   socketSndBufferSize 0
   statCacheDirPct 10
   subnets 
 ! syncWorkerThreads 256
   tiebreaker system
   tiebreakerDisks 
   tokenMemLimit 536870912
   treatOSyncLikeODSync 1
   tscTcpPort 1191
 ! tscWorkerPool 64
   uidDomain GSS.ebi.ac.uk
   uidExpiration 36000
   unmountOnDiskFail no
   useDIOXW 1
   usePersistentReserve 0
   verbsLibName libibverbs.so
   verbsPorts 
   verbsRdma disable
   verbsRdmaCm disable
   verbsRdmaCmLibName librdmacm.so
   verbsRdmaMaxSendBytes 16777216
   verbsRdmaMinBytes 8192
   verbsRdmaQpRtrMinRnrTimer 18
   verbsRdmaQpRtrPathMtu 2048
   verbsRdmaQpRtrSl 0
   verbsRdmaQpRtrSlDynamic 0
   verbsRdmaQpRtrSlDynamicTimeout 10
   verbsRdmaQpRtsRetryCnt 6
   verbsRdmaQpRtsRnrRetry 6
   verbsRdmaQpRtsTimeout 18
   verbsRdmaSend 0
   verbsRdmasPerConnection 8
   verbsRdmasPerNode 0
   verbsRdmaTimeout 18
   verifyGpfsReady 0
 ! worker1Threads 1024
 ! worker3Threads 32
   writebehindThreshold 524288

From oehmes at us.ibm.com  Tue Oct 14 18:23:50 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 10:23:50 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <543D51B6.3070602@ebi.ac.uk>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
	<543D51B6.3070602@ebi.ac.uk>
Message-ID: <OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>

you basically run GSS 1.0 code , while in the current version is GSS 2.0 
(which replaced Version 1.5 2 month ago) 

GSS 1.5 and 2.0 have several enhancements in this space so i strongly 
encourage you to upgrade your systems. 

if you can specify a bit what your workload is there might also be 
additional knobs we can turn to change the behavior. 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------

gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM:

> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
> Date: 10/14/2014 09:40 AM
> Subject: Re: [gpfsug-discuss] wait for permission to append to log
> Sent by: gpfsug-discuss-bounces at gpfsug.org
> 
> Thanks in advance for your help.
> 
> We have 6 RG:

>  recovery group        vdisks     vdisks  servers
>  ------------------  -----------  ------  -------
>  gss01a                        4       8  
gss01a.ebi.ac.uk,gss01b.ebi.ac.uk 
>  gss01b                        4       8  
gss01b.ebi.ac.uk,gss01a.ebi.ac.uk 
>  gss02a                        4       8  
gss02a.ebi.ac.uk,gss02b.ebi.ac.uk 
>  gss02b                        4       8  
gss02b.ebi.ac.uk,gss02a.ebi.ac.uk 
>  gss03a                        4       8  
gss03a.ebi.ac.uk,gss03b.ebi.ac.uk 
>  gss03b                        4       8  
gss03b.ebi.ac.uk,gss03a.ebi.ac.uk 
> 
> Check the attached file for RG details. 
> Following mmlsconfig:

> [root at gss01a ~]# mmlsconfig
> Configuration data for cluster GSS.ebi.ac.uk:
> ---------------------------------------------
> myNodeConfigNumber 1
> clusterName GSS.ebi.ac.uk
> clusterId 17987981184946329605
> autoload no
> dmapiFileHandleSize 32
> minReleaseLevel 3.5.0.11
> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
> pagepool 38g
> nsdRAIDBufferPoolSizePct 80
> maxBufferDescs 2m
> numaMemoryInterleave yes
> prefetchPct 5
> maxblocksize 16m
> nsdRAIDTracks 128k
> ioHistorySize 64k
> nsdRAIDSmallBufferSize 256k
> nsdMaxWorkerThreads 3k
> nsdMinWorkerThreads 3k
> nsdRAIDSmallThreadRatio 2
> nsdRAIDThreadsPerQueue 16
> nsdClientCksumTypeLocal ck64
> nsdClientCksumTypeRemote ck64
> nsdRAIDEventLogToConsole all
> nsdRAIDFastWriteFSDataLimit 64k
> nsdRAIDFastWriteFSMetadataLimit 256k
> nsdRAIDReconstructAggressiveness 1
> nsdRAIDFlusherBuffersLowWatermarkPct 20
> nsdRAIDFlusherBuffersLimitPct 80
> nsdRAIDFlusherTracksLowWatermarkPct 20
> nsdRAIDFlusherTracksLimitPct 80
> nsdRAIDFlusherFWLogHighWatermarkMB 1000
> nsdRAIDFlusherFWLogLimitMB 5000
> nsdRAIDFlusherThreadsLowWatermark 1
> nsdRAIDFlusherThreadsHighWatermark 512
> nsdRAIDBlockDeviceMaxSectorsKB 4096
> nsdRAIDBlockDeviceNrRequests 32
> nsdRAIDBlockDeviceQueueDepth 16
> nsdRAIDBlockDeviceScheduler deadline
> nsdRAIDMaxTransientStale2FT 1
> nsdRAIDMaxTransientStale3FT 1
> syncWorkerThreads 256
> tscWorkerPool 64
> nsdInlineWriteMax 32k
> maxFilesToCache 12k
> maxStatCache 512
> maxGeneralThreads 1280
> flushedDataTarget 1024
> flushedInodeTarget 1024
> maxFileCleaners 1024
> maxBufferCleaners 1024
> logBufferCount 20
> logWrapAmountPct 2
> logWrapThreads 128
> maxAllocRegionsPerNode 32
> maxBackgroundDeletionThreads 16
> maxInodeDeallocPrefetch 128
> maxMBpS 16000
> maxReceiverThreads 128
> worker1Threads 1024
> worker3Threads 32
> [common]
> cipherList AUTHONLY
> socketMaxListenConnections 1500
> failureDetectionTime 60
> [common]
> adminMode central
> 
> File systems in cluster GSS.ebi.ac.uk:
> --------------------------------------
> /dev/gpfs1

> For more configuration paramenters i also attached a file with the 
> complete output of mmdiag --config.
> 
> 
> and mmlsfs:
> 
> File system attributes for /dev/gpfs1:
> ======================================
> flag                value                    description
> ------------------- ------------------------ 
> -----------------------------------
>  -f                 32768                    Minimum fragment size 
> in bytes (system pool)
>                     262144                   Minimum fragment size 
> in bytes (other pools)
>  -i                 512                      Inode size in bytes
>  -I                 32768                    Indirect block size in 
bytes
>  -m                 2                        Default number of 
> metadata replicas
>  -M                 2                        Maximum number of 
> metadata replicas
>  -r                 1                        Default number of data 
replicas
>  -R                 2                        Maximum number of data 
replicas
>  -j                 scatter                  Block allocation type
>  -D                 nfs4                     File locking semantics in 
effect
>  -k                 all                      ACL semantics in effect
>  -n                 1000                     Estimated number of 
> nodes that will mount file system
>  -B                 1048576                  Block size (system pool)
>                     8388608                  Block size (other pools)
>  -Q                 user;group;fileset       Quotas enforced
>                     user;group;fileset       Default quotas enabled
>  --filesetdf        no                       Fileset df enabled?
>  -V                 13.23 (3.5.0.7)          File system version
>  --create-time      Tue Mar 18 16:01:24 2014 File system creation time
>  -u                 yes                      Support for large LUNs?
>  -z                 no                       Is DMAPI enabled?
>  -L                 4194304                  Logfile size
>  -E                 yes                      Exact mtime mount option
>  -S                 yes                      Suppress atime mount option
>  -K                 whenpossible             Strict replica allocation 
option
>  --fastea           yes                      Fast external attributes 
enabled?
>  --inode-limit      134217728                Maximum number of inodes
>  -P                 system;data              Disk storage pools in file 
system
>  -d                 
> 
gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
>  -d                 
> 
gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
>  -d                 
> 
gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
>  -d                 
> 
gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
>  -d                 
> 
gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 
> Disks in file system
>  --perfileset-quota no                       Per-fileset quota 
enforcement
>  -A                 yes                      Automatic mount option
>  -o                 none                     Additional mount options
>  -T                 /gpfs1                   Default mount point
>  --mount-priority   0                        Mount priority
> 
> 
> Regards,
> Salvatore
> 

> On 14/10/14 17:22, Sven Oehme wrote:
> your GSS code version is very backlevel. 
> 
> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk 

> as well as mmlsconfig and mmlsfs all 
> 
> thx. Sven 
> 
> ------------------------------------------
> Sven Oehme 
> Scalable Storage Research 
> email: oehmes at us.ibm.com 
> Phone: +1 (408) 824-8904 
> IBM Almaden Research Lab 
> ------------------------------------------ 
> 
> 
> 
> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk> 
> To:        gpfsug-discuss at gpfsug.org 
> Date:        10/14/2014 08:23 AM 
> Subject:        Re: [gpfsug-discuss] wait for permission to append to 
log 
> Sent by:        gpfsug-discuss-bounces at gpfsug.org 
> 
> 
> 
> 
> On 14/10/14 15:51, Sven Oehme wrote: 
> it means there is contention on inserting data into the fast write 
> log on the GSS Node, which could be config or workload related 
> what GSS code version are you running 
> [root at ebi5-251 ~]# mmdiag --version
> 
> === mmdiag: version ===
> Current GPFS build: "3.5.0-11 efix1 (888041)".
> Built on Jul  9 2013 at 18:03:32
> Running 6 days 2 hours 10 minutes 35 secs 
> 
> 
> 
> and how are the nodes connected with each other (Ethernet or IB) ? 
> ethernet. they use the same bonding (4x10Gb/s) where the data is 
> passing. We don't have admin dedicated network 
> 
> [root at gss03a ~]# mmlscluster 
> 
> GPFS cluster information
> ========================
>   GPFS cluster name:         GSS.ebi.ac.uk
>   GPFS cluster id:           17987981184946329605
>   GPFS UID domain:           GSS.ebi.ac.uk
>   Remote shell command:      /usr/bin/ssh
>   Remote file copy command:  /usr/bin/scp
> 
> GPFS cluster configuration servers:
> -----------------------------------
>   Primary server:    gss01a.ebi.ac.uk
>   Secondary server:  gss02b.ebi.ac.uk
> 
>  Node  Daemon node name    IP address  Admin node name     Designation
> -----------------------------------------------------------------------
>    1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk    
quorum-manager
>    2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk    
quorum-manager
>    3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk    
quorum-manager
>    4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk    
quorum-manager
>    5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk    
quorum-manager
>    6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk    
quorum-manager
> 
> 
> Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different 
> subnet because of datacenter constraints ( They are not physically 
> in the same row, and due to network constraints was not possible to 
> put them in the same subnet). The packets are routed, but should not
> be a problem as there is 160Gb/s bandwidth between them.
> 
> Regards,
> Salvatore
> 
> 
> 
> ------------------------------------------
> Sven Oehme 
> Scalable Storage Research 
> email: oehmes at us.ibm.com 
> Phone: +1 (408) 824-8904 
> IBM Almaden Research Lab 
> ------------------------------------------ 
> 
> 
> 
> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk> 
> To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
> Date:        10/14/2014 07:40 AM 
> Subject:        [gpfsug-discuss] wait for permission to append to log 
> Sent by:        gpfsug-discuss-bounces at gpfsug.org 
> 
> 
> 
> hello all,
> could someone explain me the meaning of those waiters?
> 
> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> 
> Does it means that the vdisk logs are struggling?
> 
> Regards,
> Salvatore
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/
> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/7c32e015/attachment-0001.htm>

From zgiles at gmail.com  Tue Oct 14 18:32:50 2014
From: zgiles at gmail.com (Zachary Giles)
Date: Tue, 14 Oct 2014 13:32:50 -0400
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>
References: <543D35A7.7080800@ebi.ac.uk>
	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>
	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
	<543D51B6.3070602@ebi.ac.uk>
	<OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>
Message-ID: <CAMYZk=en0bOeLErefCQ0X_TybYh+9QUxiKszhnwFQvqz_BHOFQ@mail.gmail.com>

Except that AFAIK no one has published how to update GSS or where the
update code is.. All I've heard is "contact your sales rep".
Any pointers?

On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme <oehmes at us.ibm.com> wrote:
> you basically run GSS 1.0 code , while in the current version is GSS 2.0
> (which replaced Version 1.5 2 month ago)
>
> GSS 1.5 and 2.0 have several enhancements in this space so i strongly
> encourage you to upgrade your systems.
>
> if you can specify a bit what your workload is there might also be
> additional knobs we can turn to change the behavior.
>
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
> gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM:
>
>> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date: 10/14/2014 09:40 AM
>> Subject: Re: [gpfsug-discuss] wait for permission to append to log
>> Sent by: gpfsug-discuss-bounces at gpfsug.org
>>
>> Thanks in advance for your help.
>>
>> We have 6 RG:
>
>>  recovery group        vdisks     vdisks  servers
>>  ------------------  -----------  ------  -------
>>  gss01a                        4       8
>> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk
>>  gss01b                        4       8
>> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk
>>  gss02a                        4       8
>> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk
>>  gss02b                        4       8
>> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk
>>  gss03a                        4       8
>> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk
>>  gss03b                        4       8
>> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk
>>
>> Check the attached file for RG details.
>> Following mmlsconfig:
>
>> [root at gss01a ~]# mmlsconfig
>> Configuration data for cluster GSS.ebi.ac.uk:
>> ---------------------------------------------
>> myNodeConfigNumber 1
>> clusterName GSS.ebi.ac.uk
>> clusterId 17987981184946329605
>> autoload no
>> dmapiFileHandleSize 32
>> minReleaseLevel 3.5.0.11
>> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
>> pagepool 38g
>> nsdRAIDBufferPoolSizePct 80
>> maxBufferDescs 2m
>> numaMemoryInterleave yes
>> prefetchPct 5
>> maxblocksize 16m
>> nsdRAIDTracks 128k
>> ioHistorySize 64k
>> nsdRAIDSmallBufferSize 256k
>> nsdMaxWorkerThreads 3k
>> nsdMinWorkerThreads 3k
>> nsdRAIDSmallThreadRatio 2
>> nsdRAIDThreadsPerQueue 16
>> nsdClientCksumTypeLocal ck64
>> nsdClientCksumTypeRemote ck64
>> nsdRAIDEventLogToConsole all
>> nsdRAIDFastWriteFSDataLimit 64k
>> nsdRAIDFastWriteFSMetadataLimit 256k
>> nsdRAIDReconstructAggressiveness 1
>> nsdRAIDFlusherBuffersLowWatermarkPct 20
>> nsdRAIDFlusherBuffersLimitPct 80
>> nsdRAIDFlusherTracksLowWatermarkPct 20
>> nsdRAIDFlusherTracksLimitPct 80
>> nsdRAIDFlusherFWLogHighWatermarkMB 1000
>> nsdRAIDFlusherFWLogLimitMB 5000
>> nsdRAIDFlusherThreadsLowWatermark 1
>> nsdRAIDFlusherThreadsHighWatermark 512
>> nsdRAIDBlockDeviceMaxSectorsKB 4096
>> nsdRAIDBlockDeviceNrRequests 32
>> nsdRAIDBlockDeviceQueueDepth 16
>> nsdRAIDBlockDeviceScheduler deadline
>> nsdRAIDMaxTransientStale2FT 1
>> nsdRAIDMaxTransientStale3FT 1
>> syncWorkerThreads 256
>> tscWorkerPool 64
>> nsdInlineWriteMax 32k
>> maxFilesToCache 12k
>> maxStatCache 512
>> maxGeneralThreads 1280
>> flushedDataTarget 1024
>> flushedInodeTarget 1024
>> maxFileCleaners 1024
>> maxBufferCleaners 1024
>> logBufferCount 20
>> logWrapAmountPct 2
>> logWrapThreads 128
>> maxAllocRegionsPerNode 32
>> maxBackgroundDeletionThreads 16
>> maxInodeDeallocPrefetch 128
>> maxMBpS 16000
>> maxReceiverThreads 128
>> worker1Threads 1024
>> worker3Threads 32
>> [common]
>> cipherList AUTHONLY
>> socketMaxListenConnections 1500
>> failureDetectionTime 60
>> [common]
>> adminMode central
>>
>> File systems in cluster GSS.ebi.ac.uk:
>> --------------------------------------
>> /dev/gpfs1
>
>> For more configuration paramenters i also attached a file with the
>> complete output of mmdiag --config.
>>
>>
>> and mmlsfs:
>>
>> File system attributes for /dev/gpfs1:
>> ======================================
>> flag                value                    description
>> ------------------- ------------------------
>> -----------------------------------
>>  -f                 32768                    Minimum fragment size
>> in bytes (system pool)
>>                     262144                   Minimum fragment size
>> in bytes (other pools)
>>  -i                 512                      Inode size in bytes
>>  -I                 32768                    Indirect block size in bytes
>>  -m                 2                        Default number of
>> metadata replicas
>>  -M                 2                        Maximum number of
>> metadata replicas
>>  -r                 1                        Default number of data
>> replicas
>>  -R                 2                        Maximum number of data
>> replicas
>>  -j                 scatter                  Block allocation type
>>  -D                 nfs4                     File locking semantics in
>> effect
>>  -k                 all                      ACL semantics in effect
>>  -n                 1000                     Estimated number of
>> nodes that will mount file system
>>  -B                 1048576                  Block size (system pool)
>>                     8388608                  Block size (other pools)
>>  -Q                 user;group;fileset       Quotas enforced
>>                     user;group;fileset       Default quotas enabled
>>  --filesetdf        no                       Fileset df enabled?
>>  -V                 13.23 (3.5.0.7)          File system version
>>  --create-time      Tue Mar 18 16:01:24 2014 File system creation time
>>  -u                 yes                      Support for large LUNs?
>>  -z                 no                       Is DMAPI enabled?
>>  -L                 4194304                  Logfile size
>>  -E                 yes                      Exact mtime mount option
>>  -S                 yes                      Suppress atime mount option
>>  -K                 whenpossible             Strict replica allocation
>> option
>>  --fastea           yes                      Fast external attributes
>> enabled?
>>  --inode-limit      134217728                Maximum number of inodes
>>  -P                 system;data              Disk storage pools in file
>> system
>>  -d
>>
>> gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
>>  -d
>>
>> gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
>>  -d
>>
>> gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
>>  -d
>>
>> gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
>>  -d
>>
>> gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3
>> Disks in file system
>>  --perfileset-quota no                       Per-fileset quota enforcement
>>  -A                 yes                      Automatic mount option
>>  -o                 none                     Additional mount options
>>  -T                 /gpfs1                   Default mount point
>>  --mount-priority   0                        Mount priority
>>
>>
>> Regards,
>> Salvatore
>>
>
>> On 14/10/14 17:22, Sven Oehme wrote:
>> your GSS code version is very backlevel.
>>
>> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk
>> as well as mmlsconfig and mmlsfs all
>>
>> thx. Sven
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug-discuss at gpfsug.org
>> Date:        10/14/2014 08:23 AM
>> Subject:        Re: [gpfsug-discuss] wait for permission to append to log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>>
>> On 14/10/14 15:51, Sven Oehme wrote:
>> it means there is contention on inserting data into the fast write
>> log on the GSS Node, which could be config or workload related
>> what GSS code version are you running
>> [root at ebi5-251 ~]# mmdiag --version
>>
>> === mmdiag: version ===
>> Current GPFS build: "3.5.0-11 efix1 (888041)".
>> Built on Jul  9 2013 at 18:03:32
>> Running 6 days 2 hours 10 minutes 35 secs
>>
>>
>>
>> and how are the nodes connected with each other (Ethernet or IB) ?
>> ethernet. they use the same bonding (4x10Gb/s) where the data is
>> passing. We don't have admin dedicated network
>>
>> [root at gss03a ~]# mmlscluster
>>
>> GPFS cluster information
>> ========================
>>   GPFS cluster name:         GSS.ebi.ac.uk
>>   GPFS cluster id:           17987981184946329605
>>   GPFS UID domain:           GSS.ebi.ac.uk
>>   Remote shell command:      /usr/bin/ssh
>>   Remote file copy command:  /usr/bin/scp
>>
>> GPFS cluster configuration servers:
>> -----------------------------------
>>   Primary server:    gss01a.ebi.ac.uk
>>   Secondary server:  gss02b.ebi.ac.uk
>>
>>  Node  Daemon node name    IP address  Admin node name     Designation
>> -----------------------------------------------------------------------
>>    1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk    quorum-manager
>>    2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk    quorum-manager
>>    3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk    quorum-manager
>>    4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk    quorum-manager
>>    5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk    quorum-manager
>>    6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk    quorum-manager
>>
>>
>> Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different
>> subnet because of datacenter constraints ( They are not physically
>> in the same row, and due to network constraints was not possible to
>> put them in the same subnet). The packets are routed, but should not
>> be a problem as there is 160Gb/s bandwidth between them.
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date:        10/14/2014 07:40 AM
>> Subject:        [gpfsug-discuss] wait for permission to append to log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>> hello all,
>> could someone explain me the meaning of those waiters?
>>
>> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>>
>> Does it means that the vdisk logs are struggling?
>>
>> Regards,
>> Salvatore
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/
>> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM]
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
Zach Giles
zgiles at gmail.com


From oehmes at us.ibm.com  Tue Oct 14 18:38:10 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 10:38:10 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <CAMYZk=en0bOeLErefCQ0X_TybYh+9QUxiKszhnwFQvqz_BHOFQ@mail.gmail.com>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
	<543D51B6.3070602@ebi.ac.uk>	<OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>
	<CAMYZk=en0bOeLErefCQ0X_TybYh+9QUxiKszhnwFQvqz_BHOFQ@mail.gmail.com>
Message-ID: <OF139B4A1E.3C67F6BA-ON88257D71.00609314-88257D71.0060E13E@us.ibm.com>

i personally don't know, i am in GPFS Research, not in support :-)
but have you tried to contact your sales rep ? 
if you are not successful with that, shoot me a direct email with details 
about your company name, country and customer number and i try to get you 
somebody to help.

thx. Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Zachary Giles <zgiles at gmail.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/14/2014 10:33 AM
Subject:        Re: [gpfsug-discuss] wait for permission to append to log
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Except that AFAIK no one has published how to update GSS or where the
update code is.. All I've heard is "contact your sales rep".
Any pointers?

On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme <oehmes at us.ibm.com> wrote:
> you basically run GSS 1.0 code , while in the current version is GSS 2.0
> (which replaced Version 1.5 2 month ago)
>
> GSS 1.5 and 2.0 have several enhancements in this space so i strongly
> encourage you to upgrade your systems.
>
> if you can specify a bit what your workload is there might also be
> additional knobs we can turn to change the behavior.
>
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
> gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM:
>
>> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date: 10/14/2014 09:40 AM
>> Subject: Re: [gpfsug-discuss] wait for permission to append to log
>> Sent by: gpfsug-discuss-bounces at gpfsug.org
>>
>> Thanks in advance for your help.
>>
>> We have 6 RG:
>
>>  recovery group        vdisks     vdisks  servers
>>  ------------------  -----------  ------  -------
>>  gss01a                        4       8
>> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk
>>  gss01b                        4       8
>> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk
>>  gss02a                        4       8
>> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk
>>  gss02b                        4       8
>> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk
>>  gss03a                        4       8
>> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk
>>  gss03b                        4       8
>> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk
>>
>> Check the attached file for RG details.
>> Following mmlsconfig:
>
>> [root at gss01a ~]# mmlsconfig
>> Configuration data for cluster GSS.ebi.ac.uk:
>> ---------------------------------------------
>> myNodeConfigNumber 1
>> clusterName GSS.ebi.ac.uk
>> clusterId 17987981184946329605
>> autoload no
>> dmapiFileHandleSize 32
>> minReleaseLevel 3.5.0.11
>> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
>> pagepool 38g
>> nsdRAIDBufferPoolSizePct 80
>> maxBufferDescs 2m
>> numaMemoryInterleave yes
>> prefetchPct 5
>> maxblocksize 16m
>> nsdRAIDTracks 128k
>> ioHistorySize 64k
>> nsdRAIDSmallBufferSize 256k
>> nsdMaxWorkerThreads 3k
>> nsdMinWorkerThreads 3k
>> nsdRAIDSmallThreadRatio 2
>> nsdRAIDThreadsPerQueue 16
>> nsdClientCksumTypeLocal ck64
>> nsdClientCksumTypeRemote ck64
>> nsdRAIDEventLogToConsole all
>> nsdRAIDFastWriteFSDataLimit 64k
>> nsdRAIDFastWriteFSMetadataLimit 256k
>> nsdRAIDReconstructAggressiveness 1
>> nsdRAIDFlusherBuffersLowWatermarkPct 20
>> nsdRAIDFlusherBuffersLimitPct 80
>> nsdRAIDFlusherTracksLowWatermarkPct 20
>> nsdRAIDFlusherTracksLimitPct 80
>> nsdRAIDFlusherFWLogHighWatermarkMB 1000
>> nsdRAIDFlusherFWLogLimitMB 5000
>> nsdRAIDFlusherThreadsLowWatermark 1
>> nsdRAIDFlusherThreadsHighWatermark 512
>> nsdRAIDBlockDeviceMaxSectorsKB 4096
>> nsdRAIDBlockDeviceNrRequests 32
>> nsdRAIDBlockDeviceQueueDepth 16
>> nsdRAIDBlockDeviceScheduler deadline
>> nsdRAIDMaxTransientStale2FT 1
>> nsdRAIDMaxTransientStale3FT 1
>> syncWorkerThreads 256
>> tscWorkerPool 64
>> nsdInlineWriteMax 32k
>> maxFilesToCache 12k
>> maxStatCache 512
>> maxGeneralThreads 1280
>> flushedDataTarget 1024
>> flushedInodeTarget 1024
>> maxFileCleaners 1024
>> maxBufferCleaners 1024
>> logBufferCount 20
>> logWrapAmountPct 2
>> logWrapThreads 128
>> maxAllocRegionsPerNode 32
>> maxBackgroundDeletionThreads 16
>> maxInodeDeallocPrefetch 128
>> maxMBpS 16000
>> maxReceiverThreads 128
>> worker1Threads 1024
>> worker3Threads 32
>> [common]
>> cipherList AUTHONLY
>> socketMaxListenConnections 1500
>> failureDetectionTime 60
>> [common]
>> adminMode central
>>
>> File systems in cluster GSS.ebi.ac.uk:
>> --------------------------------------
>> /dev/gpfs1
>
>> For more configuration paramenters i also attached a file with the
>> complete output of mmdiag --config.
>>
>>
>> and mmlsfs:
>>
>> File system attributes for /dev/gpfs1:
>> ======================================
>> flag                value                    description
>> ------------------- ------------------------
>> -----------------------------------
>>  -f                 32768                    Minimum fragment size
>> in bytes (system pool)
>>                     262144                   Minimum fragment size
>> in bytes (other pools)
>>  -i                 512                      Inode size in bytes
>>  -I                 32768                    Indirect block size in 
bytes
>>  -m                 2                        Default number of
>> metadata replicas
>>  -M                 2                        Maximum number of
>> metadata replicas
>>  -r                 1                        Default number of data
>> replicas
>>  -R                 2                        Maximum number of data
>> replicas
>>  -j                 scatter                  Block allocation type
>>  -D                 nfs4                     File locking semantics in
>> effect
>>  -k                 all                      ACL semantics in effect
>>  -n                 1000                     Estimated number of
>> nodes that will mount file system
>>  -B                 1048576                  Block size (system pool)
>>                     8388608                  Block size (other pools)
>>  -Q                 user;group;fileset       Quotas enforced
>>                     user;group;fileset       Default quotas enabled
>>  --filesetdf        no                       Fileset df enabled?
>>  -V                 13.23 (3.5.0.7)          File system version
>>  --create-time      Tue Mar 18 16:01:24 2014 File system creation time
>>  -u                 yes                      Support for large LUNs?
>>  -z                 no                       Is DMAPI enabled?
>>  -L                 4194304                  Logfile size
>>  -E                 yes                      Exact mtime mount option
>>  -S                 yes                      Suppress atime mount 
option
>>  -K                 whenpossible             Strict replica allocation
>> option
>>  --fastea           yes                      Fast external attributes
>> enabled?
>>  --inode-limit      134217728                Maximum number of inodes
>>  -P                 system;data              Disk storage pools in file
>> system
>>  -d
>>
>> 
gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
>>  -d
>>
>> 
gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
>>  -d
>>
>> 
gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
>>  -d
>>
>> 
gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
>>  -d
>>
>> 
gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3
>> Disks in file system
>>  --perfileset-quota no                       Per-fileset quota 
enforcement
>>  -A                 yes                      Automatic mount option
>>  -o                 none                     Additional mount options
>>  -T                 /gpfs1                   Default mount point
>>  --mount-priority   0                        Mount priority
>>
>>
>> Regards,
>> Salvatore
>>
>
>> On 14/10/14 17:22, Sven Oehme wrote:
>> your GSS code version is very backlevel.
>>
>> can you please send me the output of mmlsrecoverygroup RGNAME -L 
--pdisk
>> as well as mmlsconfig and mmlsfs all
>>
>> thx. Sven
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug-discuss at gpfsug.org
>> Date:        10/14/2014 08:23 AM
>> Subject:        Re: [gpfsug-discuss] wait for permission to append to 
log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>>
>> On 14/10/14 15:51, Sven Oehme wrote:
>> it means there is contention on inserting data into the fast write
>> log on the GSS Node, which could be config or workload related
>> what GSS code version are you running
>> [root at ebi5-251 ~]# mmdiag --version
>>
>> === mmdiag: version ===
>> Current GPFS build: "3.5.0-11 efix1 (888041)".
>> Built on Jul  9 2013 at 18:03:32
>> Running 6 days 2 hours 10 minutes 35 secs
>>
>>
>>
>> and how are the nodes connected with each other (Ethernet or IB) ?
>> ethernet. they use the same bonding (4x10Gb/s) where the data is
>> passing. We don't have admin dedicated network
>>
>> [root at gss03a ~]# mmlscluster
>>
>> GPFS cluster information
>> ========================
>>   GPFS cluster name:         GSS.ebi.ac.uk
>>   GPFS cluster id:           17987981184946329605
>>   GPFS UID domain:           GSS.ebi.ac.uk
>>   Remote shell command:      /usr/bin/ssh
>>   Remote file copy command:  /usr/bin/scp
>>
>> GPFS cluster configuration servers:
>> -----------------------------------
>>   Primary server:    gss01a.ebi.ac.uk
>>   Secondary server:  gss02b.ebi.ac.uk
>>
>>  Node  Daemon node name    IP address  Admin node name     Designation
>> -----------------------------------------------------------------------
>>    1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk quorum-manager
>>    2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk quorum-manager
>>    3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk quorum-manager
>>    4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk quorum-manager
>>    5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk quorum-manager
>>    6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk quorum-manager
>>
>>
>> Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different
>> subnet because of datacenter constraints ( They are not physically
>> in the same row, and due to network constraints was not possible to
>> put them in the same subnet). The packets are routed, but should not
>> be a problem as there is 160Gb/s bandwidth between them.
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date:        10/14/2014 07:40 AM
>> Subject:        [gpfsug-discuss] wait for permission to append to log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>> hello all,
>> could someone explain me the meaning of those waiters?
>>
>> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>>
>> Does it means that the vdisk logs are struggling?
>>
>> Regards,
>> Salvatore
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/
>> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM]
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
Zach Giles
zgiles at gmail.com
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/369fc6e3/attachment-0001.htm>

From tmcneil at kingston.ac.uk  Wed Oct 15 14:01:49 2014
From: tmcneil at kingston.ac.uk (Mcneil, Tony)
Date: Wed, 15 Oct 2014 14:01:49 +0100
Subject: [gpfsug-discuss] Hello
Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705@KUMBX.kuds.kingston.ac.uk>

Hello All,

Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'

We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.

The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.

So far we have migrated all our students and approximately 60% of our staff.

Looking forward to receiving some interesting posts from the forum.

Regards
Tony.

Tony McNeil
Senior Systems Support Analyst,  Infrastructure,   Information Services
______________________________________________________________________________

T   Internal: 62852
T   020 8417 2852

Kingston University London
Penrhyn Road, Kingston upon Thames KT1 2EE
www.kingston.ac.uk<http://www.kingston.ac.uk>

Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed
to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
Please consider the environment before printing this email.


This email has been scanned for all viruses by the MessageLabs Email
Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141015/8f528bcf/attachment-0001.htm>

From Bill.Pappas at STJUDE.ORG  Thu Oct 16 14:49:57 2014
From: Bill.Pappas at STJUDE.ORG (Pappas, Bill)
Date: Thu, 16 Oct 2014 08:49:57 -0500
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>

Are you using ctdb?

Thanks,
Bill Pappas -
Manager - Enterprise Storage Group
Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
262 Danny Thomas Place, Mail Stop 504
Memphis, TN 38105
bill.pappas at stjude.org
(901) 595-4549 office
www.stjude.org
Email disclaimer: http://www.stjude.org/emaildisclaimer

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
Sent: Thursday, October 16, 2014 6:00 AM
To: gpfsug-discuss at gpfsug.org
Subject: gpfsug-discuss Digest, Vol 33, Issue 19

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at gpfsug.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at gpfsug.org

You can reach the person managing the list at
	gpfsug-discuss-owner at gpfsug.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Hello (Mcneil, Tony)


----------------------------------------------------------------------

Message: 1
Date: Wed, 15 Oct 2014 14:01:49 +0100
From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Hello
Message-ID:
	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
	
Content-Type: text/plain; charset="us-ascii"

Hello All,

Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'

We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.

The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.

So far we have migrated all our students and approximately 60% of our staff.

Looking forward to receiving some interesting posts from the forum.

Regards
Tony.

Tony McNeil
Senior Systems Support Analyst,  Infrastructure,   Information Services
______________________________________________________________________________

T   Internal: 62852
T   020 8417 2852

Kingston University London
Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>

Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
Please consider the environment before printing this email.


This email has been scanned for all viruses by the MessageLabs Email Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 33, Issue 19
**********************************************


From tmcneil at kingston.ac.uk  Fri Oct 17 06:25:00 2014
From: tmcneil at kingston.ac.uk (Mcneil, Tony)
Date: Fri, 17 Oct 2014 06:25:00 +0100
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
In-Reply-To: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>

Hi Bill,

Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel

Regards
Tony.

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
Sent: 16 October 2014 14:50
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)

Are you using ctdb?

Thanks,
Bill Pappas -
Manager - Enterprise Storage Group
Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
262 Danny Thomas Place, Mail Stop 504
Memphis, TN 38105
bill.pappas at stjude.org
(901) 595-4549 office
www.stjude.org
Email disclaimer: http://www.stjude.org/emaildisclaimer

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
Sent: Thursday, October 16, 2014 6:00 AM
To: gpfsug-discuss at gpfsug.org
Subject: gpfsug-discuss Digest, Vol 33, Issue 19

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at gpfsug.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at gpfsug.org

You can reach the person managing the list at
	gpfsug-discuss-owner at gpfsug.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Hello (Mcneil, Tony)


----------------------------------------------------------------------

Message: 1
Date: Wed, 15 Oct 2014 14:01:49 +0100
From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Hello
Message-ID:
	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
	
Content-Type: text/plain; charset="us-ascii"

Hello All,

Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'

We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.

The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.

So far we have migrated all our students and approximately 60% of our staff.

Looking forward to receiving some interesting posts from the forum.

Regards
Tony.

Tony McNeil
Senior Systems Support Analyst,  Infrastructure,   Information Services
______________________________________________________________________________

T   Internal: 62852
T   020 8417 2852

Kingston University London
Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>

Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
Please consider the environment before printing this email.


This email has been scanned for all viruses by the MessageLabs Email Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 33, Issue 19
**********************************************


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

This email has been scanned for all viruses by the MessageLabs Email
Security System.

This email has been scanned for all viruses by the MessageLabs Email
Security System.


From chair at gpfsug.org  Tue Oct 21 11:42:10 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Tue, 21 Oct 2014 11:42:10 +0100
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
In-Reply-To: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>
References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
	<41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>
Message-ID: <54463882.7070009@gpfsug.org>

I noticed that v7000 Unified is using CTDB v3.3.
What magic version is that as it's not in the git tree.  Latest tagged 
is 2.5.4.
Is that a question for Amitay?

On 17/10/14 06:25, Mcneil, Tony wrote:
> Hi Bill,
>
> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>
> Regards
> Tony.
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
> Sent: 16 October 2014 14:50
> To: gpfsug-discuss at gpfsug.org
> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>
> Are you using ctdb?
>
> Thanks,
> Bill Pappas -
> Manager - Enterprise Storage Group
> Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
> 262 Danny Thomas Place, Mail Stop 504
> Memphis, TN 38105
> bill.pappas at stjude.org
> (901) 595-4549 office
> www.stjude.org
> Email disclaimer: http://www.stjude.org/emaildisclaimer
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
> Sent: Thursday, October 16, 2014 6:00 AM
> To: gpfsug-discuss at gpfsug.org
> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>
> Send gpfsug-discuss mailing list submissions to
> 	gpfsug-discuss at gpfsug.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> or, via email, send a message with subject or body 'help' to
> 	gpfsug-discuss-request at gpfsug.org
>
> You can reach the person managing the list at
> 	gpfsug-discuss-owner at gpfsug.org
>
> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>     1. Hello (Mcneil, Tony)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 15 Oct 2014 14:01:49 +0100
> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
> Subject: [gpfsug-discuss] Hello
> Message-ID:
> 	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
> 	
> Content-Type: text/plain; charset="us-ascii"
>
> Hello All,
>
> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>
> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>
> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>
> So far we have migrated all our students and approximately 60% of our staff.
>
> Looking forward to receiving some interesting posts from the forum.
>
> Regards
> Tony.
>
> Tony McNeil
> Senior Systems Support Analyst,  Infrastructure,   Information Services
> ______________________________________________________________________________
>
> T   Internal: 62852
> T   020 8417 2852
>
> Kingston University London
> Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>
>
> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
> Please consider the environment before printing this email.
>
>
> This email has been scanned for all viruses by the MessageLabs Email Security System.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> End of gpfsug-discuss Digest, Vol 33, Issue 19
> **********************************************
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> This email has been scanned for all viruses by the MessageLabs Email
> Security System.
>
> This email has been scanned for all viruses by the MessageLabs Email
> Security System.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From rtriendl at ddn.com  Tue Oct 21 11:53:37 2014
From: rtriendl at ddn.com (Robert Triendl)
Date: Tue, 21 Oct 2014 10:53:37 +0000
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
In-Reply-To: <54463882.7070009@gpfsug.org>
References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
	<41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>
	<54463882.7070009@gpfsug.org>
Message-ID: <F7513C48-28BE-48FB-B44F-3CCF4F305DFE@ddn.com>

Yes, I think so? I am;-)

On 2014/10/21, at 19:42, Jez Tucker (Chair) <chair at gpfsug.org> wrote:

> I noticed that v7000 Unified is using CTDB v3.3.
> What magic version is that as it's not in the git tree.  Latest tagged is 2.5.4.
> Is that a question for Amitay?
> 
> On 17/10/14 06:25, Mcneil, Tony wrote:
>> Hi Bill,
>> 
>> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>> 
>> Regards
>> Tony.
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
>> Sent: 16 October 2014 14:50
>> To: gpfsug-discuss at gpfsug.org
>> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>> 
>> Are you using ctdb?
>> 
>> Thanks,
>> Bill Pappas -
>> Manager - Enterprise Storage Group
>> Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
>> 262 Danny Thomas Place, Mail Stop 504
>> Memphis, TN 38105
>> bill.pappas at stjude.org
>> (901) 595-4549 office
>> www.stjude.org
>> Email disclaimer: http://www.stjude.org/emaildisclaimer
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
>> Sent: Thursday, October 16, 2014 6:00 AM
>> To: gpfsug-discuss at gpfsug.org
>> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>> 
>> Send gpfsug-discuss mailing list submissions to
>> 	gpfsug-discuss at gpfsug.org
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> or, via email, send a message with subject or body 'help' to
>> 	gpfsug-discuss-request at gpfsug.org
>> 
>> You can reach the person managing the list at
>> 	gpfsug-discuss-owner at gpfsug.org
>> 
>> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>> 
>> 
>> Today's Topics:
>> 
>>    1. Hello (Mcneil, Tony)
>> 
>> 
>> ----------------------------------------------------------------------
>> 
>> Message: 1
>> Date: Wed, 15 Oct 2014 14:01:49 +0100
>> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
>> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
>> Subject: [gpfsug-discuss] Hello
>> Message-ID:
>> 	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
>> 	
>> Content-Type: text/plain; charset="us-ascii"
>> 
>> Hello All,
>> 
>> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>> 
>> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>> 
>> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>> 
>> So far we have migrated all our students and approximately 60% of our staff.
>> 
>> Looking forward to receiving some interesting posts from the forum.
>> 
>> Regards
>> Tony.
>> 
>> Tony McNeil
>> Senior Systems Support Analyst,  Infrastructure,   Information Services
>> ______________________________________________________________________________
>> 
>> T   Internal: 62852
>> T   020 8417 2852
>> 
>> Kingston University London
>> Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>
>> 
>> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
>> Please consider the environment before printing this email.
>> 
>> 
>> This email has been scanned for all viruses by the MessageLabs Email Security System.
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>
>> 
>> ------------------------------
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> End of gpfsug-discuss Digest, Vol 33, Issue 19
>> **********************************************
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> This email has been scanned for all viruses by the MessageLabs Email
>> Security System.
>> 
>> This email has been scanned for all viruses by the MessageLabs Email
>> Security System.
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Bill.Pappas at STJUDE.ORG  Tue Oct 21 16:59:08 2014
From: Bill.Pappas at STJUDE.ORG (Pappas, Bill)
Date: Tue, 21 Oct 2014 10:59:08 -0500
Subject: [gpfsug-discuss] Hello (Mcneil, Tony) (Jez Tucker (Chair))
Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE73664E3E8D@10.stjude.org>

>>Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.

1. What procedure did you follow to configure ctdb/samba to work?  Was it hard?  Could you show us, if permitted?
2. Are you also controlling NFS via ctdb?
3. Are you managing multiple IP devices?  Eg: ethX0 for VLAN104 and ethX1 for VLAN103 (<- for fast 10GbE users).

We use SoNAS and v7000 for most NAS and they use ctdb.  Their ctdb results are overall 'ok', with a few bumps here or there. Not too many ctdb PMRs over the 3-4 years on SoNAS.  
We want to set up ctdb for a GPFS AFM cache that services GPSF data clients.  That cache writes to an AFM home (SoNAS).  This cache also uses Samba and NFS for lightweight (as in IO, though still important) file access on this cache.
It does not use ctdb, but I know it should.
I would love to learn how you set your environment up even if it may be a little (or a lot) different.


Thanks,
Bill Pappas -
Manager - Enterprise Storage Group
Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
262 Danny Thomas Place, Mail Stop 504
Memphis, TN 38105
bill.pappas at stjude.org
(901) 595-4549 office
www.stjude.org
Email disclaimer: http://www.stjude.org/emaildisclaimer

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
Sent: Tuesday, October 21, 2014 6:00 AM
To: gpfsug-discuss at gpfsug.org
Subject: gpfsug-discuss Digest, Vol 33, Issue 21

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at gpfsug.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at gpfsug.org

You can reach the person managing the list at
	gpfsug-discuss-owner at gpfsug.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Hello (Mcneil, Tony) (Jez Tucker (Chair))
   2. Re: Hello (Mcneil, Tony) (Robert Triendl)


----------------------------------------------------------------------

Message: 1
Date: Tue, 21 Oct 2014 11:42:10 +0100
From: "Jez Tucker (Chair)" <chair at gpfsug.org>
To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony)
Message-ID: <54463882.7070009 at gpfsug.org>
Content-Type: text/plain; charset=windows-1252; format=flowed

I noticed that v7000 Unified is using CTDB v3.3.
What magic version is that as it's not in the git tree.  Latest tagged is 2.5.4.
Is that a question for Amitay?

On 17/10/14 06:25, Mcneil, Tony wrote:
> Hi Bill,
>
> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>
> Regards
> Tony.
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
> Sent: 16 October 2014 14:50
> To: gpfsug-discuss at gpfsug.org
> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>
> Are you using ctdb?
>
> Thanks,
> Bill Pappas -
> Manager - Enterprise Storage Group
> Sr. Enterprise Network Storage Architect Information Sciences 
> Department / Enterprise Informatics Division St. Jude Children's 
> Research Hospital
> 262 Danny Thomas Place, Mail Stop 504
> Memphis, TN 38105
> bill.pappas at stjude.org
> (901) 595-4549 office
> www.stjude.org
> Email disclaimer: http://www.stjude.org/emaildisclaimer
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of 
> gpfsug-discuss-request at gpfsug.org
> Sent: Thursday, October 16, 2014 6:00 AM
> To: gpfsug-discuss at gpfsug.org
> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>
> Send gpfsug-discuss mailing list submissions to
> 	gpfsug-discuss at gpfsug.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> or, via email, send a message with subject or body 'help' to
> 	gpfsug-discuss-request at gpfsug.org
>
> You can reach the person managing the list at
> 	gpfsug-discuss-owner at gpfsug.org
>
> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>     1. Hello (Mcneil, Tony)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 15 Oct 2014 14:01:49 +0100
> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
> Subject: [gpfsug-discuss] Hello
> Message-ID:
> 	
> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.u
> k>
> 	
> Content-Type: text/plain; charset="us-ascii"
>
> Hello All,
>
> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>
> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>
> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>
> So far we have migrated all our students and approximately 60% of our staff.
>
> Looking forward to receiving some interesting posts from the forum.
>
> Regards
> Tony.
>
> Tony McNeil
> Senior Systems Support Analyst,  Infrastructure,   Information Services
> ______________________________________________________________________
> ________
>
> T   Internal: 62852
> T   020 8417 2852
>
> Kingston University London
> Penrhyn Road, Kingston upon Thames KT1 2EE 
> www.kingston.ac.uk<http://www.kingston.ac.uk>
>
> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
> Please consider the environment before printing this email.
>
>
> This email has been scanned for all viruses by the MessageLabs Email Security System.
> -------------- next part -------------- An HTML attachment was 
> scrubbed...
> URL: 
> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528
> bcf/attachment-0001.html>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> End of gpfsug-discuss Digest, Vol 33, Issue 19
> **********************************************
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> This email has been scanned for all viruses by the MessageLabs Email 
> Security System.
>
> This email has been scanned for all viruses by the MessageLabs Email 
> Security System.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


------------------------------

Message: 2
Date: Tue, 21 Oct 2014 10:53:37 +0000
From: Robert Triendl <rtriendl at ddn.com>
To: "chair at gpfsug.org" <chair at gpfsug.org>, gpfsug main discussion list
	<gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony)
Message-ID: <F7513C48-28BE-48FB-B44F-3CCF4F305DFE at ddn.com>
Content-Type: text/plain; charset="Windows-1252"

Yes, I think so? I am;-)

On 2014/10/21, at 19:42, Jez Tucker (Chair) <chair at gpfsug.org> wrote:

> I noticed that v7000 Unified is using CTDB v3.3.
> What magic version is that as it's not in the git tree.  Latest tagged is 2.5.4.
> Is that a question for Amitay?
> 
> On 17/10/14 06:25, Mcneil, Tony wrote:
>> Hi Bill,
>> 
>> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>> 
>> Regards
>> Tony.
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org 
>> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
>> Sent: 16 October 2014 14:50
>> To: gpfsug-discuss at gpfsug.org
>> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>> 
>> Are you using ctdb?
>> 
>> Thanks,
>> Bill Pappas -
>> Manager - Enterprise Storage Group
>> Sr. Enterprise Network Storage Architect Information Sciences 
>> Department / Enterprise Informatics Division St. Jude Children's 
>> Research Hospital
>> 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 
>> bill.pappas at stjude.org
>> (901) 595-4549 office
>> www.stjude.org
>> Email disclaimer: http://www.stjude.org/emaildisclaimer
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org 
>> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of 
>> gpfsug-discuss-request at gpfsug.org
>> Sent: Thursday, October 16, 2014 6:00 AM
>> To: gpfsug-discuss at gpfsug.org
>> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>> 
>> Send gpfsug-discuss mailing list submissions to
>> 	gpfsug-discuss at gpfsug.org
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> or, via email, send a message with subject or body 'help' to
>> 	gpfsug-discuss-request at gpfsug.org
>> 
>> You can reach the person managing the list at
>> 	gpfsug-discuss-owner at gpfsug.org
>> 
>> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>> 
>> 
>> Today's Topics:
>> 
>>    1. Hello (Mcneil, Tony)
>> 
>> 
>> ---------------------------------------------------------------------
>> -
>> 
>> Message: 1
>> Date: Wed, 15 Oct 2014 14:01:49 +0100
>> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
>> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
>> Subject: [gpfsug-discuss] Hello
>> Message-ID:
>> 	
>> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.
>> uk>
>> 	
>> Content-Type: text/plain; charset="us-ascii"
>> 
>> Hello All,
>> 
>> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>> 
>> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>> 
>> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>> 
>> So far we have migrated all our students and approximately 60% of our staff.
>> 
>> Looking forward to receiving some interesting posts from the forum.
>> 
>> Regards
>> Tony.
>> 
>> Tony McNeil
>> Senior Systems Support Analyst,  Infrastructure,   Information Services
>> _____________________________________________________________________
>> _________
>> 
>> T   Internal: 62852
>> T   020 8417 2852
>> 
>> Kingston University London
>> Penrhyn Road, Kingston upon Thames KT1 2EE 
>> www.kingston.ac.uk<http://www.kingston.ac.uk>
>> 
>> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
>> Please consider the environment before printing this email.
>> 
>> 
>> This email has been scanned for all viruses by the MessageLabs Email Security System.
>> -------------- next part -------------- An HTML attachment was 
>> scrubbed...
>> URL: 
>> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f52
>> 8bcf/attachment-0001.html>
>> 
>> ------------------------------
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> End of gpfsug-discuss Digest, Vol 33, Issue 19
>> **********************************************
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> This email has been scanned for all viruses by the MessageLabs Email 
>> Security System.
>> 
>> This email has been scanned for all viruses by the MessageLabs Email 
>> Security System.
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 33, Issue 21
**********************************************


From bbanister at jumptrading.com  Thu Oct 23 19:35:45 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 23 Oct 2014 18:35:45 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>

I reviewed my RFE request again and notice that it has been marked as ?Private? and I think this is preventing people from voting on this RFE.  I have talked to others that would like to vote for this RFE.

How can I set the RFE to public so that others may vote on it?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister
Sent: Friday, October 10, 2014 12:13 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted.  I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable.  I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!)

Thanks,
-Bryan

Ps. Hajo said that he couldn?t access the RFE to vote on it:

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."
Cheers
Hajo

Here is what the RFE website states:
Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
A unique URL that you can bookmark and share with others.

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Friday, October 10, 2014 11:52 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
I agree with Ben, I think.

I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM

To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/2647953e/attachment-0001.htm>

From bbanister at jumptrading.com  Thu Oct 23 19:50:21 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 23 Oct 2014 18:50:21 +0000
Subject: [gpfsug-discuss] GPFS User Group at SC14
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C68@CHI-EXCHANGEW2.w2k.jumptrading.com>

I'm going to be attending the GPFS User Group at SC14 this year.  Here is basic agenda that was provided:

GPFS/Elastic Storage User Group<http://www.ibm.com/marketing/campaigns/responses/servlet/IRSL?v=4&l=2&r=1552126&m=19222&p=t4AF1985E3251806FBB7C1E35C6B50F33B3C55757912C1492D293663A23F7665E328C51C1A1FF8D073BBA436369B63338&e=2>
Monday, November 17, 2014


3:00 PM-5:00 PM: GPFS/Elastic Storage User Group
[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

IBM Software Defined Storage strategy update

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Customer presentations

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Future directions such as object storage and OpenStack integration

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Elastic Storage server update

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Elastic Storage roadmap (*NDA required)


5:00 PM: Reception

Conference room location provided upon registration. *Attendees must sign a non-disclosure agreement upon arrival or as provided in advance.


I think it would be great to review the submitted RFEs and give the user group the chance to vote on them to help promote the RFEs that we care about most.

I would also really appreciate any additional details regarding the new GPFS 4.1 deadlock detection facility and any recommended best practices around this new feature.

Thanks!
-Bryan

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/c6ed46c9/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 76 bytes
Desc: image001.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/c6ed46c9/attachment-0001.gif>

From chair at gpfsug.org  Thu Oct 23 19:52:07 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Thu, 23 Oct 2014 19:52:07 +0100
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>	<5437F562.1080609@psu.edu>	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <54494E57.90304@gpfsug.org>

Hi Bryan

   Unsure, to be honest.  When I added all the GPFS UG RFEs in, I didn't 
see an option to make the RFE private.
There's private fields, but not a 'make this RFE private' checkbox or such.

This one may be better directed to the GPFS developer forum / redo the RFE.

RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll 
be feeding info back to the group.

Jez

On 23/10/14 19:35, Bryan Banister wrote:
>
> I reviewed my RFE request again and notice that it has been marked as 
> ?Private? and I think this is preventing people from voting on this 
> RFE.  I have talked to others that would like to vote for this RFE.
>
> How can I set the RFE to public so that others may vote on it?
>
> Thanks!
>
> -Bryan
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Bryan Banister
> *Sent:* Friday, October 10, 2014 12:13 PM
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
> A DMAPI daemon solution puts a dependency on the DMAPI daemon for the 
> file system to be mounted.  I think it would be better to have 
> something like what I requested in the RFE that would hopefully not 
> have this dependency, and would be optional/configurable.  I?m sure we 
> would all prefer something that is supported directly by IBM (hence 
> the RFE!)
>
> Thanks,
>
> -Bryan
>
> Ps. Hajo said that he couldn?t access the RFE to vote on it:
>
> I would like to support the RFE but i get:
>
> "You cannot access this page because you do not have the proper 
> authority."
>
> Cheers
>
> Hajo
>
> Here is what the RFE website states:
>
> Bookmarkable 
> URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
> A unique URL that you can bookmark and share with others.
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org> 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Sven Oehme
> *Sent:* Friday, October 10, 2014 11:52 AM
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
> The only DMAPI agent i am aware of is a prototype that was written by 
> tridge in 2008 to demonstrate a file based HSM system for GPFS.
>
> its a working prototype, at least it worked in 2008 :-)
>
> you can get the source code from git :
>
> http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary
>
> just to be clear, there is no Support for this code. we obviously 
> Support the DMAPI interface , but the code that exposes the API is 
> nothing we provide Support for.
>
> thx. Sven
>
> On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister 
> <bbanister at jumptrading.com <mailto:bbanister at jumptrading.com>> wrote:
>
> I agree with Ben, I think.
>
> I don?t want to use the ILM policy engine as that puts a direct 
> workload against the metadata storage and server resources.  We need 
> something out-of-band, out of the file system operational path.
>
> Is there a simple DMAPI daemon that would log the file system 
> namespace changes that we could use?
>
> If so are there any limitations?
>
> And is it possible to set this up in an HA environment?
>
> Thanks!
>
> -Bryan
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org> 
> [mailto:gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org>] *On Behalf Of *Ben De Luca
> *Sent:* Friday, October 10, 2014 11:10 AM
>
>
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
> querying this through the policy engine is far to late to do any thing 
> useful with it
>
> On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com 
> <mailto:oehmes at gmail.com>> wrote:
>
> Ben,
>
> to get lists of 'Hot Files' turn File Heat on , some discussion about 
> it is here : 
> https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653
>
> thx.  Sven
>
> On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com 
> <mailto:bdeluca at gmail.com>> wrote:
>
> Id like this to see hot files
>
> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister 
> <bbanister at jumptrading.com <mailto:bbanister at jumptrading.com>> wrote:
>
> Hmm... I didn't think to use the DMAPI interface.  That could be a 
> nice option.  Has anybody done this already and are there any examples 
> we could look at?
>
> Thanks!
> -Bryan
>
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org> 
> [mailto:gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
> Sent: Friday, October 10, 2014 10:04 AM
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>
> On 10/9/14 3:31 PM, Bryan Banister wrote:
> >
> > Just wanted to pass my GPFS RFE along:
> >
> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> > 0458
> >
> >
> > *Description*:
> >
> > GPFS File System Manager should provide the option to log all file and
> > directory operations that occur in a file system, preferably stored in
> > a TSD (Time Series Database) that could be quickly queried through an
> > API interface and command line tools.  ...
> >
>
> The rudimentaries for this already exist via the DMAPI interface in 
> GPFS (used by the TSM HSM product). A while ago this was posted to the 
> IBM GPFS DeveloperWorks forum:
>
> On 1/3/11 10:27 AM, dWForums wrote:
> > Author:
> > AlokK.Dhir
> >
> > Message:
> > We have a proof of concept which uses DMAPI to listens to and 
> passively logs filesystem changes with a non blocking listener.  This 
> log can be used to generate backup sets etc. Unfortunately, a bug in 
> the current DMAPI keeps this approach from working in the case of 
> certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly 
> share the code once it is working.
>
> -Phil
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ------------------------------------------------------------------------
>
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ------------------------------------------------------------------------
>
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> ------------------------------------------------------------------------
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/c2f15d0b/attachment-0001.htm>

From bbanister at jumptrading.com  Thu Oct 23 19:59:52 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 23 Oct 2014 18:59:52 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <54494E57.90304@gpfsug.org>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<54494E57.90304@gpfsug.org>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C98@CHI-EXCHANGEW2.w2k.jumptrading.com>

Looks like IBM decides if the RFE is public or private:

Q: What are private requests?
A: Private requests are requests that can be viewed only by IBM, the request author, members of a group with the request in its watchlist, and users with the request in their watchlist. Only the author of the request can add a private request to their watchlist or a group watchlist. Private requests appear in various public views, such as Top 20 watched or Planned requests; however, only limited information about the request will be displayed.
IBM determines the default request visibility of a request, either public or private, and IBM may change the request visibility at any time. If you are watching a request and have subscribed to email notifications, you will be notified if the visibility of the request changes.

I'm submitting a request to make the RFE public so that others may vote on it now,
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jez Tucker (Chair)
Sent: Thursday, October 23, 2014 1:52 PM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

Hi Bryan

  Unsure, to be honest.  When I added all the GPFS UG RFEs in, I didn't see an option to make the RFE private.
There's private fields, but not a 'make this RFE private' checkbox or such.

This one may be better directed to the GPFS developer forum / redo the RFE.

RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll be feeding info back to the group.

Jez
On 23/10/14 19:35, Bryan Banister wrote:
I reviewed my RFE request again and notice that it has been marked as "Private" and I think this is preventing people from voting on this RFE.  I have talked to others that would like to vote for this RFE.

How can I set the RFE to public so that others may vote on it?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister
Sent: Friday, October 10, 2014 12:13 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted.  I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable.  I'm sure we would all prefer something that is supported directly by IBM (hence the RFE!)

Thanks,
-Bryan

Ps. Hajo said that he couldn't access the RFE to vote on it:

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."
Cheers
Hajo

Here is what the RFE website states:
Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
A unique URL that you can bookmark and share with others.

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Friday, October 10, 2014 11:52 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
I agree with Ben, I think.

I don't want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM

To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/e4dbfbd9/attachment-0001.htm>

From bbanister at jumptrading.com  Fri Oct 24 19:58:07 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 24 Oct 2014 18:58:07 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <OF63B850E2.11E81404-ON65257D6A.005322A1-65257D6A.00544690@in.ibm.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OF63B850E2.11E81404-ON65257D6A.005322A1-65257D6A.00544690@in.ibm.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB94C513@CHI-EXCHANGEW2.w2k.jumptrading.com>

It is with humble apology and great relief that I was wrong about the AFM limitation that I believed existed in the configuration I explained below.


The problem that I had with my configuration is that the NSD client cluster was not completely updated to GPFS 4.1.0-3, as there are a few nodes still running 3.5.0-20 in the cluster which currently prevents upgrading the GPFS file system release version (e.g. mmchconfig release=LATEST) to 4.1.0-3.  This GPFS configuration ?requirement? isn?t documented in the Advanced Admin Guide, but it makes sense that this is required since only the GPFS 4.1 release supports the GPFS protocol for AFM fileset targets.


I have tested the configuration with a new NSD Client cluster and the configuration works as desired.


Thanks Kalyan and others for their feedback.  Our file system namespace is unfortunately filled with small files that do not allow AFM to parallelize the data transfers across multiple nodes.  And unfortunately AFM will only allow one Gateway node per fileset to perform the prefetch namespace scan operation, which is incredibly slow as I stated before.  We were only seeing roughly 100 x " Queue numExec" operations per second.  I think this performance is gated by the directory namespace scan of the single gateway node.


Thanks!

-Bryan


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda
Sent: Tuesday, October 07, 2014 10:21 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations


some clarifications inline:


Regards

Kalyan

GPFS Development

EGL D Block, Bangalore


From:    Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>

To:          gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>

Date:     10/07/2014 08:12 PM

Subject:               Re: [gpfsug-discuss] AFM limitations in a multi-cluster

            environment, slow prefetch operations

Sent by:               gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>


Interesting that AFM is supposed to work in a multi-cluster environment.

We were using GPFS on the backend.  The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard

multi-cluster mount.   The "gateway" nodes in the new cluster mounted the

old file system.  All systems were connected over the same QDR IB fabric.

The client compute nodes in the third cluster mounted both the old and new file systems.  I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed.  I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable.  There were no error messages in any GPFS or system logs related to the `ls` fail.  NFS clients accessing cNFS servers in the new cluster also worked as expected.  The `ls` from the NFS client in an AFM fileset returned the expected directory listing.  Thus all symptoms indicated the configuration wasn't supported.  I may try to replicate the problem in a test environment at some point.


However AFM isn't really a great solution for file data migration between file systems for these reasons:

1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data).  No way to have it simply keep the two namespaces in sync.  And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system.  This is concerning and it is not easy to detect that this event has occurred.


--> The LU mode is meant for scenarios where changes in cache are not

--> meant

to be pushed back to old filesystem.  If thats not whats desired then other AFM modes like IW can be used to keep namespace in sync and data can flow from both sides.  Typically, for data migration --metadata-only to pull in the full namespace first and data can be migrated on demand or via policy as outlined above using prefetch cmd.  AFM setup should be extension to GPFS multi-cluster setup when using GPFS backend.


2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME!  And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES!

--> I am not sure I follow the first downtime.  If applications have to

start using the new filesystem, then they have to be informed accordingly.

If this can be done without bringing down applications, then there is no DOWNTIME.

Regarding, second downtime, you are right, disabling AFM after data migration requires unlink and hence downtime.  But there is a easy workaround, where revalidation intervals can be increased to max or GW nodes can be unconfigured without downtime with same effect.  And disabling AFM can be done at a later point during maintenance window.  We plan to modify this to have this done online aka without requiring unlink of the fileset.  This will get prioritized if there is enough interest in AFM being used in this direction.


3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration.  Multiple threads from a single node just doesn't cut it due to single node bandwidth limits.  When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second.  The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list!

--> Prefetch can run on multiple nodes by configuring multiple GW nodes

--> and

enabling parallel i/o as specified in the docs..link provided below.

Infact it can parallelize data xfer to a single file and also do multiple files in parallel depending on filesizes and various tuning params.


4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration.  Yes, there is of course efforts to breakup the namespace for each rsync operations.  This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired.


--> AFM can be used for data migration without any downtime dictated by

--> AFM

(see above) and it can infact use multiple threads on multiple nodes to do parallel i/o.


AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN.


Thanks,

-Bryan


-----Original Message-----

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda

Sent: Tuesday, October 07, 2014 12:03 AM

To: gpfsug main discussion list

Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations


Hi Bryan,

AFM supports GPFS multi-cluster..and we have customers already using this successfully.  Are you using GPFS backend?

Can you explain your configuration in detail and if ls is hung it would have generated some long waiters.  Maybe this should be pursued separately via PMR.  You can ping me the details directly if needed along with opening a PMR per IBM service process.


As for as prefetch is concerned, right now its limited to  one prefetch job per fileset.  Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration.

"afmNumFlushThreads" tunable controls the number of threads used by AFM.

This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.)


eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed.


List the change:

mmlsfileset fs1 prefetchIW --afm -L

Filesets in file system 'fs1':


Attributes for fileset prefetchIW:

===================================

Status                                  Linked

Path                                    /gpfs/fs1/prefetchIW

Id                                      36

afm-associated                          Yes

Target

nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch

Mode                                    independent-writer

File Lookup Refresh Interval            30 (default)

File Open Refresh Interval              30 (default)

Dir Lookup Refresh Interval             60 (default)

Dir Open Refresh Interval               60 (default)

Async Delay                             15 (default)

Last pSnapId                            0

Display Home Snapshots                  no

Number of Gateway Flush Threads         5

Prefetch Threshold                      0 (default)

Eviction Enabled                        yes (default)


AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm


and this link outlines tuning params for parallel i/o along with others:

http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning


Regards

Kalyan

GPFS Development

EGL D Block, Bangalore


From:   Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>

To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>

Date:   10/06/2014 09:57 PM

Subject:        Re: [gpfsug-discuss] AFM limitations in a multi-cluster

            environment, slow prefetch operations

Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>


We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan


From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme

Sent: Monday, October 06, 2014 11:28 AM

To: gpfsug main discussion list

Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations


Hi Bryan,


in 4.1 AFM uses multiple threads for reading data, this was different in

3.5 . what version are you using ?


thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>

wrote:

Just an FYI to the GPFS user community,


We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.


Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:

GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching:

v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset).


We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes.

However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.


Cheers,

-Bryan


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141024/ff1481a0/attachment-0001.htm>

From chair at gpfsug.org  Wed Oct 29 13:59:40 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Wed, 29 Oct 2014 13:59:40 +0000
Subject: [gpfsug-discuss] Storagebeers, Nov 13th
Message-ID: <5450F2CC.3070302@gpfsug.org>

Hello all,

   I just thought I'd make you all aware of a social, #storagebeers on 
Nov 13th organised by Martin Glassborow, one of our UG members.

http://www.gpfsug.org/2014/10/29/storagebeers-13th-nov/

I'll be popping along.  Hopefully see you there.

Jez


From Jared.Baker at uwyo.edu  Wed Oct 29 15:31:31 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 15:31:31 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
Message-ID: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/5e7d4cd0/attachment-0001.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 16:33:22 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 16:33:22 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <1414600402.24518.216.camel@buzzard.phy.strath.ac.uk>

On Wed, 2014-10-29 at 15:31 +0000, Jared David Baker wrote:

[SNIP]

> I?m wondering if somebody has seen this type of issue before? Will
> recreating my NSDs destroy the filesystem? I?m thinking that all the
> data is intact, but there is no crucial data on this file system yet,
> so I could recreate the file system, but I would like to learn how to
> solve a problem like this. Thanks for all help and information.
> 

At an educated guess and assuming the disks are visible to the OS (try
dd'ing the first few GB to /dev/null) it looks like you have managed at
some point to wipe the NSD descriptors from the disks - ouch.

The file system will continue to work after this has been done, but if
you start rebooting the NSD servers you will find after the last one has
been restarted the file system is unmountable. Simply unmounting the
file systems from each NDS server is also probably enough. For good
measure unless you have a backup of the NSD descriptors somewhere it is
also an unrecoverable condition.

Lucky for you if there is nothing on it that matters.

My suggestion is re-examine what you did during the firmware upgrade, as
that is the most likely culprit. However bear in mind that it could have
been days or even weeks ago that it occurred.

I would raise a PMR to be sure, but it looks to me like you will be
recreating the file system from scratch.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From oehmes at gmail.com  Wed Oct 29 16:42:26 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Wed, 29 Oct 2014 09:42:26 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>

Hello,

there are multiple reasons why the descriptors can not be found .

there was a recent change in firmware behaviors on multiple servers that
restore the GPT table from a disk if the disk was used as a OS disk before
used as GPFS disks.  some infos here :
https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e

if thats the case there is a procedure to restore them.

it could also be something very trivial , e.g. that your multipath mapping
changed and your nsddevice file actually just prints out devices instead of
scanning them and create a list on the fly , so GPFS ignores the new path
to the disks.
in any case , opening a PMR and work with Support is the best thing to do
before causing any more damage.
if the file-system is still mounted don't unmount it under any
circumstances as Support needs to extract NSD descriptor information from
it to restore them easily.

Sven


On Wed, Oct 29, 2014 at 8:31 AM, Jared David Baker <Jared.Baker at uwyo.edu>
wrote:

>  Hello all,
>
>
>
> I?m hoping that somebody can shed some light on a problem that I
> experienced yesterday. I?ve been working with GPFS for a couple months as
> an admin now, but I?ve come across a problem that I?m unable to see the
> answer to. Hopefully the solution is not listed somewhere blatantly on the
> web, but I spent a fair amount of time looking last night. Here is the
> situation: yesterday, I needed to update some firmware on a Mellanox HCA
> FDR14 card and reboot one of our GPFS servers and repeat for the sister
> node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However,
> upon reboot, the server seemed to lose the path mappings to the multipath
> devices for the NSDs. Output below:
>
>
>
> --
>
> [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch
>
>
>
> Disk name    NSD volume ID      Device         Node name
> Remarks
>
>
> ---------------------------------------------------------------------------------------
>
> dcs3800u31a_lun0 0A62001B54235577   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun0 0A62001B54235577   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini
>        (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -
> mminsd6.infini           (not found) server node
>
>
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> Also, the system was working fantastically before the reboot, but now I?m
> unable to mount the GPFS filesystem. The disk names look like they are
> there and mapped to the NSD volume ID, but there is no Device. I?ve created
> the /var/mmfs/etc/nsddevices script and it has the following output with
> user return 0:
>
>
>
> --
>
> [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
>
> mapper/dcs3800u31a_lun0 dmm
>
> mapper/dcs3800u31a_lun10 dmm
>
> mapper/dcs3800u31a_lun2 dmm
>
> mapper/dcs3800u31a_lun4 dmm
>
> mapper/dcs3800u31a_lun6 dmm
>
> mapper/dcs3800u31a_lun8 dmm
>
> mapper/dcs3800u31b_lun1 dmm
>
> mapper/dcs3800u31b_lun11 dmm
>
> mapper/dcs3800u31b_lun3 dmm
>
> mapper/dcs3800u31b_lun5 dmm
>
> mapper/dcs3800u31b_lun7 dmm
>
> mapper/dcs3800u31b_lun9 dmm
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> That output looks correct to me based on the documentation. So I went
> digging in the GPFS log file and found this relevant information:
>
>
>
> --
>
> Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails.
> No such NSD locally found.
>
> Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails.
> No such NSD locally found.
>
> --
>
>
>
> Okay, so the NSDs don?t seem to be able to be found, so I attempt to
> rediscover the NSD by executing the command mmnsddiscover:
>
>
>
> --
>
> [root at mmmnsd5 ~]# mmnsddiscover
>
> mmnsddiscover:  Attempting to rediscover the disks.  This may take a while
> ...
>
> mmnsddiscover:  Finished.
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> I was hoping that finished, but then upon restarting GPFS, there was no
> success. Verifying with mmlsnsd -X -f gscratch
>
>
>
> --
>
> [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch
>
>
>
> Disk name    NSD volume ID      Device         Devtype  Node
> name                Remarks
>
>
> ---------------------------------------------------------------------------------------------------
>
> dcs3800u31a_lun0 0A62001B54235577   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun0 0A62001B54235577   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -              -
>    mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -              -
> mminsd6.infini           (not found) server node
>
>
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> I?m wondering if somebody has seen this type of issue before? Will
> recreating my NSDs destroy the filesystem? I?m thinking that all the data
> is intact, but there is no crucial data on this file system yet, so I could
> recreate the file system, but I would like to learn how to solve a problem
> like this. Thanks for all help and information.
>
>
>
> Regards,
>
>
>
> Jared
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/be381140/attachment-0001.htm>

From oester at gmail.com  Wed Oct 29 16:46:35 2014
From: oester at gmail.com (Bob Oesterlin)
Date: Wed, 29 Oct 2014 11:46:35 -0500
Subject: [gpfsug-discuss] GPFS 4.1 event "deadlockOverload"
Message-ID: <CAMNdFvA9XpNVGGM9=BefO9x6rjkm9tGoD6jeG759H1Whd=4f9w@mail.gmail.com>

I posted this to developerworks, but haven't seen a response. This is NOT
the same event "deadlockDetected" that is documented in the 4.1 Probelm
Determination Guide.

I see these errors -in my mmfslog on the cluster master. I just upgraded to
4.1, and I can't find this documented anywhere. What is "event
deadlockOverload" ? And what script would it call?


The nodes in question are part of a CNFS group.


Mon Oct 27 10:11:08.848 2014: [I] Received overload notification request
from 10.30.42.30 to forward to all nodes in cluster XXX
Mon Oct 27 10:11:08.849 2014: [I] Calling User Exit Script
gpfsNotifyOverload: event deadlockOverload, Async command
/usr/lpp/mmfs/bin/mmcommon.
Mon Oct 27 10:11:14.478 2014: [I] Received overload notification request
from 10.30.42.26 to forward to all nodes in cluster XXX
Mon Oct 27 10:11:58.869 2014: [I] Received overload notification request
from 10.30.42.30 to forward to all nodes in cluster XXX
Mon Oct 27 10:11:58.870 2014: [I] Calling User Exit Script
gpfsNotifyOverload: event deadlockOverload, Async command
/usr/lpp/mmfs/bin/mmcommon.

Bob Oesterlin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/40209cd2/attachment-0001.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 17:19:14 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 17:19:14 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
Message-ID: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>

On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> Hello,
> 
> 
> there are multiple reasons why the descriptors can not be found .
> 
> 
> there was a recent change in firmware behaviors on multiple servers
> that restore the GPT table from a disk if the disk was used as a OS
> disk before used as GPFS disks.  some infos
> here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e
> 
> 
> if thats the case there is a procedure to restore them.

I have been categorically told by IBM in no uncertain terms if the NSD
descriptors have *ALL* been wiped then it is game over for that file
system; restore from backup is your only option.

If the GPT table has been "restored" and overwritten the NSD descriptors
then you are hosed.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From oehmes at gmail.com  Wed Oct 29 17:22:30 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Wed, 29 Oct 2014 10:22:30 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
Message-ID: <CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>

if you still have a running system you can extract the information and
recreate the descriptors.
if your sytem is already down, this is not possible any more.

which is why i suggested to open a PMR as the Support team will be able to
provide the right guidance and help .

Sven

On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard <jonathan at buzzard.me.uk>
wrote:

> On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> > Hello,
> >
> >
> > there are multiple reasons why the descriptors can not be found .
> >
> >
> > there was a recent change in firmware behaviors on multiple servers
> > that restore the GPT table from a disk if the disk was used as a OS
> > disk before used as GPFS disks.  some infos
> > here :
> https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e
> >
> >
> > if thats the case there is a procedure to restore them.
>
> I have been categorically told by IBM in no uncertain terms if the NSD
> descriptors have *ALL* been wiped then it is game over for that file
> system; restore from backup is your only option.
>
> If the GPT table has been "restored" and overwritten the NSD descriptors
> then you are hosed.
>
> JAB.
>
> --
> Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
> Fife, United Kingdom.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/98e54436/attachment-0001.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 17:29:09 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 17:29:09 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
Message-ID: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk>

On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote:
> if you still have a running system you can extract the information and
> recreate the descriptors. 

We had a running system with the file system still mounted on some nodes
but all the NSD descriptors wiped, and I repeat where categorically told
by IBM that nothing could be done and to restore the file system from
backup.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From Jared.Baker at uwyo.edu  Wed Oct 29 17:30:00 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 17:30:00 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
Message-ID: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com>

Thanks for all the information. I?m not exactly sure what happened during the firmware update of the HCAs (another admin). But I do have all the stanza files that I used to create the NSDs. Possible to utilize them to just regenerate the NSDs or is it consensus that the FS is gone? As the system was not in production (yet) I?ve got no problem delaying the release and running some tests to verify possible fixes. The system was already unmounted, so it is a completely inactive FS across the cluster.

Thanks,

Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 11:23 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

if you still have a running system you can extract the information and recreate the descriptors.
if your sytem is already down, this is not possible any more.

which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help .

Sven

On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard <jonathan at buzzard.me.uk<mailto:jonathan at buzzard.me.uk>> wrote:
On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> Hello,
>
>
> there are multiple reasons why the descriptors can not be found .
>
>
> there was a recent change in firmware behaviors on multiple servers
> that restore the GPT table from a disk if the disk was used as a OS
> disk before used as GPFS disks.  some infos
> here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e
>
>
> if thats the case there is a procedure to restore them.

I have been categorically told by IBM in no uncertain terms if the NSD
descriptors have *ALL* been wiped then it is game over for that file
system; restore from backup is your only option.

If the GPT table has been "restored" and overwritten the NSD descriptors
then you are hosed.

JAB.

--
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk<http://buzzard.me.uk>
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/75266e11/attachment-0001.htm>

From oehmes at us.ibm.com  Wed Oct 29 17:45:38 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 10:45:38 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
	<4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF4F2A0B39.CDE2633C-ON88257D80.0060D847-88257D80.00618FB5@us.ibm.com>

Jared,

if time permits i would open a PMR to check what happened. as i stated in 
my first email it could be multiple things, the GPT restore is only one 
possible of many explanations and some more simple reasons could explain 
what you see as well. get somebody from support check the state and then 
we know for sure. it would give you also peace of mind that it doesn't 
happen again when you are in production.
if you feel its not worth and you don't wipe any important information 
start over again.

btw. the newer BIOS versions of IBM servers have a option from preventing 
the GPT issue from happening : 

[root at gss02n1 ~]# asu64 showvalues DiskGPTRecovery.DiskGPTRecovery
IBM Advanced Settings Utility version 9.61.85B
Licensed Materials - Property of IBM
(C) Copyright IBM Corp. 2007-2014 All Rights Reserved
IMM LAN-over-USB device 0 enabled successfully.
Successfully discovered the IMM via SLP.
Discovered IMM at IP address 169.254.95.118
Connected to IMM at IP address 169.254.95.118
DiskGPTRecovery.DiskGPTRecovery=None=<Automatic>

if you set it the GPT will never get restored. you would have to set this 
on all the nodes that have access to the disks.

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 10:30 AM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Thanks for all the information. I?m not exactly sure what happened during 
the firmware update of the HCAs (another admin). But I do have all the 
stanza files that I used to create the NSDs. Possible to utilize them to 
just regenerate the NSDs or is it consensus that the FS is gone? As the 
system was not in production (yet) I?ve got no problem delaying the 
release and running some tests to verify possible fixes. The system was 
already unmounted, so it is a completely inactive FS across the cluster.
 
Thanks,
 
Jared
 
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 11:23 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings
 
if you still have a running system you can extract the information and 
recreate the descriptors. 
if your sytem is already down, this is not possible any more. 
 
which is why i suggested to open a PMR as the Support team will be able to 
provide the right guidance and help . 
 
Sven
 
On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard <jonathan at buzzard.me.uk
> wrote:
On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> Hello,
>
>
> there are multiple reasons why the descriptors can not be found .
>
>
> there was a recent change in firmware behaviors on multiple servers
> that restore the GPT table from a disk if the disk was used as a OS
> disk before used as GPFS disks.  some infos
> here : 
https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e

>
>
> if thats the case there is a procedure to restore them.

I have been categorically told by IBM in no uncertain terms if the NSD
descriptors have *ALL* been wiped then it is game over for that file
system; restore from backup is your only option.

If the GPT table has been "restored" and overwritten the NSD descriptors
then you are hosed.

JAB.

--
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/3f10207c/attachment-0001.htm>

From ewahl at osc.edu  Wed Oct 29 18:57:28 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Wed, 29 Oct 2014 18:57:28 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>,
	<1414603749.24518.227.camel@buzzard.phy.strath.ac.uk>
Message-ID: <C59E5201836F7147BAD35189FFBB35D10116515C18@USOAPP09V04P.si.lan>

SOBAR is your friend at that point?

Ed Wahl
OSC

 
________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jonathan Buzzard [jonathan at buzzard.me.uk]
Sent: Wednesday, October 29, 2014 1:29 PM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote:
> if you still have a running system you can extract the information and
> recreate the descriptors.

We had a running system with the file system still mounted on some nodes
but all the NSD descriptors wiped, and I repeat where categorically told
by IBM that nothing could be done and to restore the file system from
backup.

JAB.

--
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From ewahl at osc.edu  Wed Oct 29 19:07:34 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Wed, 29 Oct 2014 19:07:34 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>

  Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.  
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I?m hoping that somebody can shed some light on a problem that I experienced yesterday. I?ve been working with GPFS for a couple months as an admin now, but I?ve come across a problem that I?m unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I?m unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I?ve created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don?t seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I?m wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I?m thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared


From Jared.Baker at uwyo.edu  Wed Oct 29 19:27:26 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 19:27:26 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
Message-ID: <e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>

Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

   8       48 31251951616 sdd
   8       32 31251951616 sdc
   8       80 31251951616 sdf
   8       16 31251951616 sdb
   8      128 31251951616 sdi
   8      112 31251951616 sdh
   8       96 31251951616 sdg
   8      192 31251951616 sdm
   8      240 31251951616 sdp
   8      208 31251951616 sdn
   8      144 31251951616 sdj
   8       64 31251951616 sde
   8      224 31251951616 sdo
   8      160 31251951616 sdk
   8      176 31251951616 sdl
  65        0 31251951616 sdq
  65       48 31251951616 sdt
  65       16 31251951616 sdr
  65      128  584960000 sdy
  65       80 31251951616 sdv
  65       96 31251951616 sdw
  65       64 31251951616 sdu
  65      112 31251951616 sdx
  65       32 31251951616 sds
   8        0 31251951616 sda
 253        0 31251951616 dm-0
 253        1 31251951616 dm-1
 253        2 31251951616 dm-2
 253        3 31251951616 dm-3
 253        4 31251951616 dm-4
 253        5 31251951616 dm-5
 253        6 31251951616 dm-6
 253        7 31251951616 dm-7
 253        8 31251951616 dm-8
 253        9 31251951616 dm-9
 253       10 31251951616 dm-10
 253       11 31251951616 dm-11
 253       12  584960000 dm-12
 253       13     524288 dm-13
 253       14   16777216 dm-14
 253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

  Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.  
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oehmes at us.ibm.com  Wed Oct 29 19:41:22 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 12:41:22 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>

can you please post the content of your nsddevices script ? 

also please run 

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is 
all SAS attached. Two servers which can see the multipath LUNS for 
failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

   8       48 31251951616 sdd
   8       32 31251951616 sdc
   8       80 31251951616 sdf
   8       16 31251951616 sdb
   8      128 31251951616 sdi
   8      112 31251951616 sdh
   8       96 31251951616 sdg
   8      192 31251951616 sdm
   8      240 31251951616 sdp
   8      208 31251951616 sdn
   8      144 31251951616 sdj
   8       64 31251951616 sde
   8      224 31251951616 sdo
   8      160 31251951616 sdk
   8      176 31251951616 sdl
  65        0 31251951616 sdq
  65       48 31251951616 sdt
  65       16 31251951616 sdr
  65      128  584960000 sdy
  65       80 31251951616 sdv
  65       96 31251951616 sdw
  65       64 31251951616 sdu
  65      112 31251951616 sdx
  65       32 31251951616 sds
   8        0 31251951616 sda
 253        0 31251951616 dm-0
 253        1 31251951616 dm-1
 253        2 31251951616 dm-2
 253        3 31251951616 dm-3
 253        4 31251951616 dm-4
 253        5 31251951616 dm-5
 253        6 31251951616 dm-6
 253        7 31251951616 dm-7
 253        8 31251951616 dm-8
 253        9 31251951616 dm-9
 253       10 31251951616 dm-10
 253       11 31251951616 dm-11
 253       12  584960000 dm-12
 253       13     524288 dm-13
 253       14   16777216 dm-14
 253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

  Can you see the block devices from inside the OS after the reboot?  I 
don't see where you mention this.  How is the storage attached to the 
server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes 
share the storage?  All nsds in same failure group?     I was quickly 
brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly 
updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one 
day, dm-something else another), so that is no problem.  But wacking the 
volume label is a pain. 
When hardware dies if you have nsds sharing the same LUNs you can just 
transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org 
[gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker 
[Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I 
experienced yesterday. I've been working with GPFS for a couple months as 
an admin now, but I've come across a problem that I'm unable to see the 
answer to. Hopefully the solution is not listed somewhere blatantly on the 
web, but I spent a fair amount of time looking last night. Here is the 
situation: yesterday, I needed to update some firmware on a Mellanox HCA 
FDR14 card and reboot one of our GPFS servers and repeat for the sister 
node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, 
upon reboot, the server seemed to lose the path mappings to the multipath 
devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini   (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini   (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini   (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini   (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini  (not 
found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm 
unable to mount the GPFS filesystem. The disk names look like they are 
there and mapped to the NSD volume ID, but there is no Device. I've 
created the /var/mmfs/etc/nsddevices script and it has the following 
output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went 
digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. 
No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. 
No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to 
rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while 
...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no 
success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name  Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd5.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd5.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini 
          (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will 
recreating my NSDs destroy the filesystem? I'm thinking that all the data 
is intact, but there is no crucial data on this file system yet, so I 
could recreate the file system, but I would like to learn how to solve a 
problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/8b616a16/attachment-0001.htm>

From Jared.Baker at uwyo.edu  Wed Oct 29 19:46:23 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 19:46:23 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
Message-ID: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>

Sven, output below:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--

Thanks, Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

can you please post the content of your nsddevices script ?

also please run

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
 `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

  8       48 31251951616 sdd
  8       32 31251951616 sdc
  8       80 31251951616 sdf
  8       16 31251951616 sdb
  8      128 31251951616 sdi
  8      112 31251951616 sdh
  8       96 31251951616 sdg
  8      192 31251951616 sdm
  8      240 31251951616 sdp
  8      208 31251951616 sdn
  8      144 31251951616 sdj
  8       64 31251951616 sde
  8      224 31251951616 sdo
  8      160 31251951616 sdk
  8      176 31251951616 sdl
 65        0 31251951616 sdq
 65       48 31251951616 sdt
 65       16 31251951616 sdr
 65      128  584960000 sdy
 65       80 31251951616 sdv
 65       96 31251951616 sdw
 65       64 31251951616 sdu
 65      112 31251951616 sdx
 65       32 31251951616 sds
  8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

 Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/636898cf/attachment-0001.htm>

From oehmes at us.ibm.com  Wed Oct 29 20:02:53 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 13:02:53 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>

Hi,

i was asking for the content, not the result :-)

can you run cat /var/mmfs/etc/nsddevices

the 2nd output confirms at least that there is no correct label on the 
disk, as it returns EFI 

on a GNR system you get a few more infos , but at least you should see the 
NSD descriptor string like i get on my system : 


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings
T7$V
e2d2s08
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s

while i still would like to see the nsddevices script i assume your NSD 
descriptor is wiped and without a lot of manual labor and at least a 
recent GPFS dump this is very hard if at all to recreate.


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 12:46 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Sven, output below:
 
--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--
 
Thanks, Jared
 
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings
 
can you please post the content of your nsddevices script ? 

also please run   

dd if=/dev/dm-0 bs=1k count=32 |strings 

and post the output 

thx. Sven 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Jared David Baker <Jared.Baker at uwyo.edu> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/29/2014 12:27 PM 
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is 
all SAS attached. Two servers which can see the multipath LUNS for 
failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
 `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

  8       48 31251951616 sdd
  8       32 31251951616 sdc
  8       80 31251951616 sdf
  8       16 31251951616 sdb
  8      128 31251951616 sdi
  8      112 31251951616 sdh
  8       96 31251951616 sdg
  8      192 31251951616 sdm
  8      240 31251951616 sdp
  8      208 31251951616 sdn
  8      144 31251951616 sdj
  8       64 31251951616 sde
  8      224 31251951616 sdo
  8      160 31251951616 sdk
  8      176 31251951616 sdl
 65        0 31251951616 sdq
 65       48 31251951616 sdt
 65       16 31251951616 sdr
 65      128  584960000 sdy
 65       80 31251951616 sdv
 65       96 31251951616 sdw
 65       64 31251951616 sdu
 65      112 31251951616 sdx
 65       32 31251951616 sds
  8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

 Can you see the block devices from inside the OS after the reboot?  I 
don't see where you mention this.  How is the storage attached to the 
server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes 
share the storage?  All nsds in same failure group?     I was quickly 
brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly 
updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one 
day, dm-something else another), so that is no problem.  But wacking the 
volume label is a pain. 
When hardware dies if you have nsds sharing the same LUNs you can just 
transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org 
[gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker 
[Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I 
experienced yesterday. I've been working with GPFS for a couple months as 
an admin now, but I've come across a problem that I'm unable to see the 
answer to. Hopefully the solution is not listed somewhere blatantly on the 
web, but I spent a fair amount of time looking last night. Here is the 
situation: yesterday, I needed to update some firmware on a Mellanox HCA 
FDR14 card and reboot one of our GPFS servers and repeat for the sister 
node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, 
upon reboot, the server seemed to lose the path mappings to the multipath 
devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini   (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini   (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini   (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini   (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini  (not 
found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm 
unable to mount the GPFS filesystem. The disk names look like they are 
there and mapped to the NSD volume ID, but there is no Device. I've 
created the /var/mmfs/etc/nsddevices script and it has the following 
output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went 
digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. 
No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. 
No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to 
rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while 
...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no 
success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name  Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd5.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd5.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini 
          (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will 
recreating my NSDs destroy the filesystem? I'm thinking that all the data 
is intact, but there is no crucial data on this file system yet, so I 
could recreate the file system, but I would like to learn how to solve a 
problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/52ddf40d/attachment-0001.htm>

From Jared.Baker at uwyo.edu  Wed Oct 29 20:13:06 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 20:13:06 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
Message-ID: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>

Apologies Sven, w/o comments below:

--
#!/bin/ksh
CONTROLLER_REGEX='[ab]_lun[0-9]+'
for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX )
do
   echo mapper/$dev dmm
   #echo mapper/$dev generic
done

# Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover),
return 0
--

Best, Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:03 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Hi,

i was asking for the content, not the result :-)

can you run cat /var/mmfs/etc/nsddevices

the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI

on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system :


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings
T7$V
e2d2s08
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s

while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate.


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:46 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Sven, output below:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--

Thanks, Jared

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

can you please post the content of your nsddevices script ?

also please run

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

 8       48 31251951616 sdd
 8       32 31251951616 sdc
 8       80 31251951616 sdf
 8       16 31251951616 sdb
 8      128 31251951616 sdi
 8      112 31251951616 sdh
 8       96 31251951616 sdg
 8      192 31251951616 sdm
 8      240 31251951616 sdp
 8      208 31251951616 sdn
 8      144 31251951616 sdj
 8       64 31251951616 sde
 8      224 31251951616 sdo
 8      160 31251951616 sdk
 8      176 31251951616 sdl
65        0 31251951616 sdq
65       48 31251951616 sdt
65       16 31251951616 sdr
65      128  584960000 sdy
65       80 31251951616 sdv
65       96 31251951616 sdw
65       64 31251951616 sdu
65      112 31251951616 sdx
65       32 31251951616 sds
 8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/fc35facb/attachment-0001.htm>

From oehmes at us.ibm.com  Wed Oct 29 20:25:10 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 13:25:10 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>

Hi,

based on what i see is your BIOS or FW update wiped the NSD descriptor by 
restoring a GPT table on the start of a disk that shouldn't have a GPT 
table to begin with as its under control of GPFS.
future releases of GPFS prevent this by writing our own GPT label to the 
disks so other tools don't touch them, but that doesn't help in your case 
any more. if you want this officially confirmed i would still open a PMR, 
but at that point given that you don't seem to have any production data on 
it from what i see in your response you should recreate the filesystem.

Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 01:13 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Apologies Sven, w/o comments below:
 
--
#!/bin/ksh
CONTROLLER_REGEX='[ab]_lun[0-9]+'
for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX )
do
   echo mapper/$dev dmm
   #echo mapper/$dev generic
done
 
# Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover),
return 0
--
 
Best, Jared
 
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:03 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings
 
Hi, 

i was asking for the content, not the result :-) 

can you run cat /var/mmfs/etc/nsddevices 

the 2nd output confirms at least that there is no correct label on the 
disk, as it returns EFI 

on a GNR system you get a few more infos , but at least you should see the 
NSD descriptor string like i get on my system : 


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings 
T7$V 
e2d2s08 
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014 
32+0 records in 
32+0 records out 
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s 

while i still would like to see the nsddevices script i assume your NSD 
descriptor is wiped and without a lot of manual labor and at least a 
recent GPFS dump this is very hard if at all to recreate. 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Jared David Baker <Jared.Baker at uwyo.edu> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/29/2014 12:46 PM 
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


Sven, output below: 
  
-- 
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices 
mapper/dcs3800u31a_lun0 dmm 
mapper/dcs3800u31a_lun10 dmm 
mapper/dcs3800u31a_lun2 dmm 
mapper/dcs3800u31a_lun4 dmm 
mapper/dcs3800u31a_lun6 dmm 
mapper/dcs3800u31a_lun8 dmm 
mapper/dcs3800u31b_lun1 dmm 
mapper/dcs3800u31b_lun11 dmm 
mapper/dcs3800u31b_lun3 dmm 
mapper/dcs3800u31b_lun5 dmm 
mapper/dcs3800u31b_lun7 dmm 
mapper/dcs3800u31b_lun9 dmm 
[root at mmmnsd5 ~]# 
-- 
-- 
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 
32+0 records in 
32+0 records out 
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s 
EFI PART 
system 
[root at mmmnsd5 /]# 
-- 
  
Thanks, Jared 
  
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings 
  
can you please post the content of your nsddevices script ? 

also please run   

dd if=/dev/dm-0 bs=1k count=32 |strings 

and post the output 

thx. Sven 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Jared David Baker <Jared.Baker at uwyo.edu> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/29/2014 12:27 PM 
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is 
all SAS attached. Two servers which can see the multipath LUNS for 
failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

 8       48 31251951616 sdd
 8       32 31251951616 sdc
 8       80 31251951616 sdf
 8       16 31251951616 sdb
 8      128 31251951616 sdi
 8      112 31251951616 sdh
 8       96 31251951616 sdg
 8      192 31251951616 sdm
 8      240 31251951616 sdp
 8      208 31251951616 sdn
 8      144 31251951616 sdj
 8       64 31251951616 sde
 8      224 31251951616 sdo
 8      160 31251951616 sdk
 8      176 31251951616 sdl
65        0 31251951616 sdq
65       48 31251951616 sdt
65       16 31251951616 sdr
65      128  584960000 sdy
65       80 31251951616 sdv
65       96 31251951616 sdw
65       64 31251951616 sdu
65      112 31251951616 sdx
65       32 31251951616 sds
 8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Can you see the block devices from inside the OS after the reboot?  I 
don't see where you mention this.  How is the storage attached to the 
server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes 
share the storage?  All nsds in same failure group?     I was quickly 
brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly 
updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one 
day, dm-something else another), so that is no problem.  But wacking the 
volume label is a pain. 
When hardware dies if you have nsds sharing the same LUNs you can just 
transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org 
[gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker 
[Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I 
experienced yesterday. I've been working with GPFS for a couple months as 
an admin now, but I've come across a problem that I'm unable to see the 
answer to. Hopefully the solution is not listed somewhere blatantly on the 
web, but I spent a fair amount of time looking last night. Here is the 
situation: yesterday, I needed to update some firmware on a Mellanox HCA 
FDR14 card and reboot one of our GPFS servers and repeat for the sister 
node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, 
upon reboot, the server seemed to lose the path mappings to the multipath 
devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini   (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini   (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini   (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini   (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini  (not 
found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm 
unable to mount the GPFS filesystem. The disk names look like they are 
there and mapped to the NSD volume ID, but there is no Device. I've 
created the /var/mmfs/etc/nsddevices script and it has the following 
output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went 
digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. 
No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. 
No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to 
rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while 
...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no 
success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name  Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd5.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd5.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini 
          (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will 
recreating my NSDs destroy the filesystem? I'm thinking that all the data 
is intact, but there is no crucial data on this file system yet, so I 
could recreate the file system, but I would like to learn how to solve a 
problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/70ca7229/attachment-0001.htm>

From Jared.Baker at uwyo.edu  Wed Oct 29 20:30:29 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 20:30:29 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
Message-ID: <fdd5ef1e6e4d4444a49655c2f28d2f09@CY1PR0501MB1259.namprd05.prod.outlook.com>

Thanks Sven, I appreciate the feedback. I'll be opening the PMR soon.

Again, thanks for the information.

Best, Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:25 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Hi,

based on what i see is your BIOS or FW update wiped the NSD descriptor by restoring a GPT table on the start of a disk that shouldn't have a GPT table to begin with as its under control of GPFS.
future releases of GPFS prevent this by writing our own GPT label to the disks so other tools don't touch them, but that doesn't help in your case any more. if you want this officially confirmed i would still open a PMR, but at that point given that you don't seem to have any production data on it from what i see in your response you should recreate the filesystem.

Sven

------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 01:13 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Apologies Sven, w/o comments below:

--
#!/bin/ksh
CONTROLLER_REGEX='[ab]_lun[0-9]+'
for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX )
do
   echo mapper/$dev dmm
   #echo mapper/$dev generic
done

# Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover),
return 0
--

Best, Jared

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:03 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Hi,

i was asking for the content, not the result :-)

can you run cat /var/mmfs/etc/nsddevices

the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI

on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system :


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings
T7$V
e2d2s08
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s

while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate.


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:46 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Sven, output below:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--

Thanks, Jared

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

can you please post the content of your nsddevices script ?

also please run

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

8       48 31251951616 sdd
8       32 31251951616 sdc
8       80 31251951616 sdf
8       16 31251951616 sdb
8      128 31251951616 sdi
8      112 31251951616 sdh
8       96 31251951616 sdg
8      192 31251951616 sdm
8      240 31251951616 sdp
8      208 31251951616 sdn
8      144 31251951616 sdj
8       64 31251951616 sde
8      224 31251951616 sdo
8      160 31251951616 sdk
8      176 31251951616 sdl
65        0 31251951616 sdq
65       48 31251951616 sdt
65       16 31251951616 sdr
65      128  584960000 sdy
65       80 31251951616 sdv
65       96 31251951616 sdw
65       64 31251951616 sdu
65      112 31251951616 sdx
65       32 31251951616 sds
8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/c07e505f/attachment-0001.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 20:32:25 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 20:32:25 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
Message-ID: <54514ED9.9030604@buzzard.me.uk>

On 29/10/14 20:25, Sven Oehme wrote:
> Hi,
>
> based on what i see is your BIOS or FW update wiped the NSD descriptor
> by restoring a GPT table on the start of a disk that shouldn't have a
> GPT table to begin with as its under control of GPFS.
> future releases of GPFS prevent this by writing our own GPT label to the
> disks so other tools don't touch them, but that doesn't help in your
> case any more. if you want this officially confirmed i would still open
> a PMR, but at that point given that you don't seem to have any
> production data on it from what i see in your response you should
> recreate the filesystem.
>

However before recreating the file system I would run the script to see 
if your disks have the secondary copy of the GPT partition table and if 
they do make sure it is wiped/removed *BEFORE* you go any further. 
Otherwise it could happen again...

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From Jared.Baker at uwyo.edu  Wed Oct 29 20:47:51 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 20:47:51 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <54514ED9.9030604@buzzard.me.uk>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
	<54514ED9.9030604@buzzard.me.uk>
Message-ID: <e97cc5ab29a0476f95fe1db739f8eea7@CY1PR0501MB1259.namprd05.prod.outlook.com>

Jonathan, which script are you talking about?

Thanks, Jared

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jonathan Buzzard
Sent: Wednesday, October 29, 2014 2:32 PM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

On 29/10/14 20:25, Sven Oehme wrote:
> Hi,
>
> based on what i see is your BIOS or FW update wiped the NSD descriptor
> by restoring a GPT table on the start of a disk that shouldn't have a
> GPT table to begin with as its under control of GPFS.
> future releases of GPFS prevent this by writing our own GPT label to the
> disks so other tools don't touch them, but that doesn't help in your
> case any more. if you want this officially confirmed i would still open
> a PMR, but at that point given that you don't seem to have any
> production data on it from what i see in your response you should
> recreate the filesystem.
>

However before recreating the file system I would run the script to see 
if your disks have the secondary copy of the GPT partition table and if 
they do make sure it is wiped/removed *BEFORE* you go any further. 
Otherwise it could happen again...

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From jonathan at buzzard.me.uk  Wed Oct 29 21:01:06 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 21:01:06 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <e97cc5ab29a0476f95fe1db739f8eea7@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>	<54514ED9.9030604@buzzard.me.uk>
	<e97cc5ab29a0476f95fe1db739f8eea7@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <54515592.4050606@buzzard.me.uk>

On 29/10/14 20:47, Jared David Baker wrote:
> Jonathan, which script are you talking about?
>

The one here

https://www.ibm.com/developerworks/community/forums/html/topic?id=32296bac-bfa1-45ff-9a43-08b0a36b17ef&ps=25

Use for detecting and clearing that secondary GPT table. Never used it 
of course, my disaster was caused by an idiot admin installing a new OS 
not mapping the disks out and then hit yes yes yes when asked if he 
wanted to blank the disks, the RHEL installer duly obliged. Then five 
days later I rebooted the last NSD server for an upgrade and BOOM 50TB 
and 80 million files down the swanny.


JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From mark.bergman at uphs.upenn.edu  Fri Oct 31 17:10:55 2014
From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu)
Date: Fri, 31 Oct 2014 13:10:55 -0400
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
Message-ID: <25152-1414775455.156309@Pc2q.WYui.XCNm>

Many GPFS logs & utilities refer to nodes via their <cXnY> name.

I haven't found an "mm*" executable that shows the mapping between that
name an the hostname.

Is there a simple method to map the <cXnY> designation to the node's
hostname?

Thanks,

Mark


From bevans at pixitmedia.com  Fri Oct 31 17:32:45 2014
From: bevans at pixitmedia.com (Barry Evans)
Date: Fri, 31 Oct 2014 17:32:45 +0000
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm>
References: <25152-1414775455.156309@Pc2q.WYui.XCNm>
Message-ID: <5453C7BD.8030608@pixitmedia.com>

I'm sure there is a better way to do this, but old habits die hard. I 
tend to use 'mmfsadm saferdump tscomm' - connection details should be 
littered throughout.

Cheers,
Barry
ArcaStream/Pixit Media


mark.bergman at uphs.upenn.edu wrote:
> Many GPFS logs&  utilities refer to nodes via their<cXnY>  name.
>
> I haven't found an "mm*" executable that shows the mapping between that
> name an the hostname.
>
> Is there a simple method to map the<cXnY>  designation to the node's
> hostname?
>
> Thanks,
>
> Mark
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 

This email is confidential in that it is intended for the exclusive 
attention of the addressee(s) indicated. If you are not the intended 
recipient, this email should not be read or disclosed to any other person. 
Please notify the sender immediately and delete this email from your 
computer system. Any opinions expressed are not necessarily those of the 
company from which this email was sent and, whilst to the best of our 
knowledge no viruses or defects exist, no responsibility can be accepted 
for any loss or damage arising from its receipt or subsequent use of this 
email.


From oehmes at us.ibm.com  Fri Oct 31 18:20:40 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Fri, 31 Oct 2014 11:20:40 -0700
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm>
References: <25152-1414775455.156309@Pc2q.WYui.XCNm>
Message-ID: <OFC9042F14.E53F6CAF-ON88257D82.0064B734-88257D82.0064C4CB@us.ibm.com>

Hi,

the official way to do this is mmdiag --network 

thx. Sven


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   mark.bergman at uphs.upenn.edu
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/31/2014 10:11 AM
Subject:        [gpfsug-discuss] mapping <cXnY> to hostname?
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Many GPFS logs & utilities refer to nodes via their <cXnY> name.

I haven't found an "mm*" executable that shows the mapping between that
name an the hostname.

Is there a simple method to map the <cXnY> designation to the node's
hostname?

Thanks,

Mark

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141031/35713187/attachment-0001.htm>

From mark.bergman at uphs.upenn.edu  Fri Oct 31 18:57:44 2014
From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu)
Date: Fri, 31 Oct 2014 14:57:44 -0400
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
In-Reply-To: Your message of "Fri, 31 Oct 2014 11:20:40 -0700."
	<OFC9042F14.E53F6CAF-ON88257D82.0064B734-88257D82.0064C4CB@us.ibm.com>
References: <OFC9042F14.E53F6CAF-ON88257D82.0064B734-88257D82.0064C4CB@us.ibm.com>
	<25152-1414775455.156309@Pc2q.WYui.XCNm>
Message-ID: <9586-1414781864.388104@tEdB.dMla.tGDi>

In the message dated: Fri, 31 Oct 2014 11:20:40 -0700,
The pithy ruminations from Sven Oehme on 
<Re: [gpfsug-discuss] mapping <cXnY> to hostname?> were:
=> Hi,
=> 
=> the official way to do this is mmdiag --network 

OK.

I'm now using:

	mmdiag --network | awk '{if ( $1 ~ /<c[0-9]*n/ ) { printf $1 " " ; system("getent hosts "$2) }}'


Thanks,

Mark

=> 
=> thx. Sven
=> 
=> 
=> ------------------------------------------
=> Sven Oehme 
=> Scalable Storage Research 
=> email: oehmes at us.ibm.com 
=> Phone: +1 (408) 824-8904 
=> IBM Almaden Research Lab 
=> ------------------------------------------
=> 
=> 
=> 
=> From:   mark.bergman at uphs.upenn.edu
=> To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
=> Date:   10/31/2014 10:11 AM
=> Subject:        [gpfsug-discuss] mapping <cXnY> to hostname?
=> Sent by:        gpfsug-discuss-bounces at gpfsug.org
=> 
=> 
=> 
=> Many GPFS logs & utilities refer to nodes via their <cXnY> name.
=> 
=> I haven't found an "mm*" executable that shows the mapping between that
=> name an the hostname.
=> 
=> Is there a simple method to map the <cXnY> designation to the node's
=> hostname?
=> 
=> Thanks,
=> 
=> Mark
=> 


From stuartb at 4gh.net  Fri Oct  3 18:19:08 2014
From: stuartb at 4gh.net (Stuart Barkley)
Date: Fri, 3 Oct 2014 13:19:08 -0400 (EDT)
Subject: [gpfsug-discuss]  filesets and mountpoint naming
Message-ID: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>

Resent: First copy sent Sept 23.  Maybe stuck in a moderation queue?

When we first started using GPFS we created several filesystems and
just directly mounted them where seemed appropriate.  We have
something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome
and are looking at using filesets inside fewer filesystems to
accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user
experience, but I'm looking for suggestions as to where to mount the
actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one
cluster has a traditional gpfs system and a separate gss system which
will obviously need multiple mount points.  We also want to consider
possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you
do differently?

Thanks,
Stuart
-- 
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone


From bbanister at jumptrading.com  Mon Oct  6 16:17:44 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Mon, 6 Oct 2014 15:17:44 +0000
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>
References: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com>

There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang.

Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment.  Hope that helps,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
Sent: Friday, October 03, 2014 12:19 PM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] filesets and mountpoint naming

Resent: First copy sent Sept 23.  Maybe stuck in a moderation queue?

When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you do differently?

Thanks,
Stuart
--
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


From bbanister at jumptrading.com  Mon Oct  6 16:36:17 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Mon, 6 Oct 2014 15:36:17 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>

Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch -j FilesetName command (which specifies
a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in
parallel on a single fileset).

We were able to quickly create the "--home-inode-file" from the old file system using the mmapplypolicy command as the documentation describes.  However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.

Cheers,
-Bryan


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141006/24cbed89/attachment-0002.htm>

From Sandra.McLaughlin at astrazeneca.com  Mon Oct  6 16:40:45 2014
From: Sandra.McLaughlin at astrazeneca.com (McLaughlin, Sandra M)
Date: Mon, 6 Oct 2014 15:40:45 +0000
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
References: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
Message-ID: <5ed81d7bfbc94873aa804cfc807d5858@DBXPR04MB031.eurprd04.prod.outlook.com>

Hi Stuart,

We have a very similar setup. I use /gpfs01, /gpfs02 etc. and then use filesets within those, and symbolic links on the gpfs cluster members to give the same user experience combined with automounter maps (we have a large number of NFS clients as well as cluster members).  This all works quite well.

Regards, Sandra


--------------------------------------------------------------------------
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number: 03674842 and a registered office at 2 Kingdom Street, London, W2 6BD.
Confidentiality Notice: This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful.
Disclaimer: Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by AstraZeneca UK Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email.
Monitoring: AstraZeneca UK Limited may monitor email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking Compliance with our Code of Conduct and Policies.
-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
Sent: 23 September 2014 16:47
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] filesets and mountpoint naming

When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you do differently?

Thanks,
Stuart
--
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From zgiles at gmail.com  Mon Oct  6 16:42:56 2014
From: zgiles at gmail.com (Zachary Giles)
Date: Mon, 6 Oct 2014 11:42:56 -0400
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CAMYZk=cdXgXpFE7pkYxk8VRa7_Ani0hsrz68Y_HjE3GR+4xsyQ@mail.gmail.com>

Here we have just one large GPFS file system with many file sets
inside. We mount it under /sc/something (sc for scientific computing).
We user the /sc/ as we previously had another GPFS file system while
migrating from one to the other. It's pretty easy and straight forward
to have just one file system.. eases administration and mounting.
You can make symlinks.. like /scratch -> /sc/something/scratch/ if you
want. We did that, and it's how most of our users got to the system
for a long time. We even remounted the GPFS file system from where DDN
left it at install time ( /gs01 ) to /sc/gs01, updated the symlink,
and the users never knew.

Multicluster for compute nodes separate from the FS cluster.

YMMV depending on if you want to allow everyone to mount your file
system or not. I know some people don't. We only admin our own boxes
and no one else does, so it works best this way for us given the ideal
scenario.


On Mon, Oct 6, 2014 at 11:17 AM, Bryan Banister
<bbanister at jumptrading.com> wrote:
> There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang.
>
> Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment.  Hope that helps,
> -Bryan
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
> Sent: Friday, October 03, 2014 12:19 PM
> To: gpfsug-discuss at gpfsug.org
> Subject: [gpfsug-discuss] filesets and mountpoint naming
>
> Resent: First copy sent Sept 23.  Maybe stuck in a moderation queue?
>
> When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:
>
>     /home
>     /scratch
>     /projects
>     /reference
>     /applications
>
> We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).
>
> We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.
>
> We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.
>
> Some thoughts are to just do filesystems as:
>
>     /gpfs01, /gpfs02, etc.
>     /mnt/gpfs01, etc
>     /mnt/clustera/gpfs01, etc.
>
> What have other people done?  Are you happy with it?  What would you do differently?
>
> Thanks,
> Stuart
> --
> I've never been lost; I was once bewildered for three days, but never lost!
>                                         --  Daniel Boone _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Zach Giles
zgiles at gmail.com


From oehmes at gmail.com  Mon Oct  6 17:27:58 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Mon, 6 Oct 2014 09:27:58 -0700
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:

>  Just an FYI to the GPFS user community,
>
>
>
> We have been testing out GPFS AFM file systems in our required process of
> file data migration between two GPFS file systems.  The two GPFS file
> systems are managed in two separate GPFS clusters.  We have a third GPFS
> cluster for compute systems.  We created new independent AFM filesets in
> the new GPFS file system that are linked to directories in the old file
> system.  Unfortunately access to the AFM filesets from the compute cluster
> completely hang.  Access to the other parts of the second file system is
> fine.  This limitation/issue is not documented in the Advanced Admin Guide.
>
>
>
> Further, we performed prefetch operations using a file mmafmctl command,
> but the process appears to be single threaded and the operation was
> extremely slow as a result.  According to the Advanced Admin Guide, it is
> not possible to run multiple prefetch jobs on the same fileset:
>
> GPFS can prefetch the data using the *mmafmctl **Device **prefetch ?j **FilesetName
> *command (which specifies
>
> a list of files to prefetch). Note the following about prefetching:
>
> v It can be run in parallel on multiple filesets (although more than one
> prefetching job cannot be run in
>
> parallel on a single fileset).
>
>
>
> We were able to quickly create the ?--home-inode-file? from the old file
> system using the mmapplypolicy command as the documentation describes.
> However the AFM prefetch operation is so slow that we are better off
> running parallel rsync operations between the file systems versus using the
> GPFS AFM prefetch operation.
>
>
>
> Cheers,
>
> -Bryan
>
>
>
> ------------------------------
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141006/37d054b7/attachment-0002.htm>

From bbanister at jumptrading.com  Mon Oct  6 17:30:02 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Mon, 6 Oct 2014 16:30:02 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>

We are using 4.1.0.3 on the cluster with the AFM filesets,
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies
a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in
parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes.  However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.

Cheers,
-Bryan


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141006/220cf9e0/attachment-0002.htm>

From kgunda at in.ibm.com  Tue Oct  7 06:03:07 2014
From: kgunda at in.ibm.com (Kalyan Gunda)
Date: Tue, 7 Oct 2014 10:33:07 +0530
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>

Hi Bryan,
 AFM supports GPFS multi-cluster..and we have customers already using this
successfully.  Are you using GPFS backend?
Can you explain your configuration in detail and if ls is hung it would
have generated some long waiters.  Maybe this should be pursued separately
via PMR.  You can ping me the details directly if needed along with opening
a PMR per IBM service process.

As for as prefetch is concerned, right now its limited to  one prefetch job
per fileset.  Each job in itself is multi-threaded and can use multi-nodes
to pull in data based on configuration.
"afmNumFlushThreads" tunable controls the number of threads used by AFM.
This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't
show this param for some reason, I will have that updated.)

eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5
Fileset prefetchIW changed.

List the change:
 mmlsfileset fs1 prefetchIW --afm -L
Filesets in file system 'fs1':

Attributes for fileset prefetchIW:
===================================
Status                                  Linked
Path                                    /gpfs/fs1/prefetchIW
Id                                      36
afm-associated                          Yes
Target
nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch
Mode                                    independent-writer
File Lookup Refresh Interval            30 (default)
File Open Refresh Interval              30 (default)
Dir Lookup Refresh Interval             60 (default)
Dir Open Refresh Interval               60 (default)
Async Delay                             15 (default)
Last pSnapId                            0
Display Home Snapshots                  no
Number of Gateway Flush Threads         5
Prefetch Threshold                      0 (default)
Eviction Enabled                        yes (default)

AFM parallel i/o can be setup such that multiple GW nodes can be used to
pull in data..more details are available here
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm

and this link outlines tuning params for parallel i/o along with others:
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning

Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:	Bryan Banister <bbanister at jumptrading.com>
To:	gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:	10/06/2014 09:57 PM
Subject:	Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:	gpfsug-discuss-bounces at gpfsug.org


We are using 4.1.0.3 on the cluster with the AFM filesets,
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster
environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of
file data migration between two GPFS file systems.  The two GPFS file
systems are managed in two separate GPFS clusters.  We have a third GPFS
cluster for compute systems.  We created new independent AFM filesets in
the new GPFS file system that are linked to directories in the old file
system.  Unfortunately access to the AFM filesets from the compute cluster
completely hang.  Access to the other parts of the second file system is
fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command,
but the process appears to be single threaded and the operation was
extremely slow as a result.  According to the Advanced Admin Guide, it is
not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j
FilesetName command (which specifies
a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one
prefetching job cannot be run in
parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file
system using the mmapplypolicy command as the documentation describes.
However the AFM prefetch operation is so slow that we are better off
running parallel rsync operations between the file systems versus using the
GPFS AFM prefetch operation.

Cheers,
-Bryan


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From bbanister at jumptrading.com  Tue Oct  7 15:44:48 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Tue, 7 Oct 2014 14:44:48 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>

Interesting that AFM is supposed to work in a multi-cluster environment.  We were using GPFS on the backend.  The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount.   The "gateway" nodes in the new cluster mounted the old file system.  All systems were connected over the same QDR IB fabric.  The client compute nodes in the third cluster mounted both the old and new file systems.  I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed.  I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable.  There were no error messages in any GPFS or system logs related to the `ls` fail.  NFS clients accessing cNFS servers in the new cluster also worked as expected.  The `ls` from the NFS client in an AFM fileset returned the expected directory listing.  Thus all symptoms indicated the configuration wasn't supported.  I may try to replicate the problem in a test environment at some point.

However AFM isn't really a great solution for file data migration between file systems for these reasons:
1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data).  No way to have it simply keep the two namespaces in sync.  And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system.  This is concerning and it is not easy to detect that this event has occurred.

2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME!  And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES!

3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration.  Multiple threads from a single node just doesn't cut it due to single node bandwidth limits.  When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second.  The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list!

4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration.  Yes, there is of course efforts to breakup the namespace for each rsync operations.  This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired.

AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN.

Thanks,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda
Sent: Tuesday, October 07, 2014 12:03 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations

Hi Bryan,
 AFM supports GPFS multi-cluster..and we have customers already using this successfully.  Are you using GPFS backend?
Can you explain your configuration in detail and if ls is hung it would have generated some long waiters.  Maybe this should be pursued separately via PMR.  You can ping me the details directly if needed along with opening a PMR per IBM service process.

As for as prefetch is concerned, right now its limited to  one prefetch job per fileset.  Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration.
"afmNumFlushThreads" tunable controls the number of threads used by AFM.
This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.)

eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed.

List the change:
 mmlsfileset fs1 prefetchIW --afm -L
Filesets in file system 'fs1':

Attributes for fileset prefetchIW:
===================================
Status                                  Linked
Path                                    /gpfs/fs1/prefetchIW
Id                                      36
afm-associated                          Yes
Target
nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch
Mode                                    independent-writer
File Lookup Refresh Interval            30 (default)
File Open Refresh Interval              30 (default)
Dir Lookup Refresh Interval             60 (default)
Dir Open Refresh Interval               60 (default)
Async Delay                             15 (default)
Last pSnapId                            0
Display Home Snapshots                  no
Number of Gateway Flush Threads         5
Prefetch Threshold                      0 (default)
Eviction Enabled                        yes (default)

AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm

and this link outlines tuning params for parallel i/o along with others:
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning

Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:   Bryan Banister <bbanister at jumptrading.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/06/2014 09:57 PM
Subject:        Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:        gpfsug-discuss-bounces at gpfsug.org


We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan

From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes.
However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.

Cheers,
-Bryan


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

From kgunda at in.ibm.com  Tue Oct  7 16:20:30 2014
From: kgunda at in.ibm.com (Kalyan Gunda)
Date: Tue, 7 Oct 2014 20:50:30 +0530
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <OF63B850E2.11E81404-ON65257D6A.005322A1-65257D6A.00544690@in.ibm.com>

some clarifications inline:

Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:	Bryan Banister <bbanister at jumptrading.com>
To:	gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:	10/07/2014 08:12 PM
Subject:	Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:	gpfsug-discuss-bounces at gpfsug.org


Interesting that AFM is supposed to work in a multi-cluster environment.
We were using GPFS on the backend.  The new GPFS file system was AFM linked
over GPFS protocol to the old GPFS file system using the standard
multi-cluster mount.   The "gateway" nodes in the new cluster mounted the
old file system.  All systems were connected over the same QDR IB fabric.
The client compute nodes in the third cluster mounted both the old and new
file systems.  I looked for waiters on the client and NSD servers of the
new file system when the problem occurred, but none existed.  I tried
stracing the `ls` process, but it reported nothing and the strace itself
become unkillable.  There were no error messages in any GPFS or system logs
related to the `ls` fail.  NFS clients accessing cNFS servers in the new
cluster also worked as expected.  The `ls` from the NFS client in an AFM
fileset returned the expected directory listing.  Thus all symptoms
indicated the configuration wasn't supported.  I may try to replicate the
problem in a test environment at some point.

However AFM isn't really a great solution for file data migration between
file systems for these reasons:
1) It requires the complicated AFM setup, which requires manual operations
to sync data between the file systems (e.g. mmapplypolicy run on old file
system to get file list THEN mmafmctl prefetch operation on the new AFM
fileset to pull data).  No way to have it simply keep the two namespaces in
sync.  And you must be careful with the "Local Update" configuration not to
modify basically ANY file attributes in the new AFM fileset until a CLEAN
cutover of your application is performed, otherwise AFM will remove the
link of the file to data stored on the old file system.  This is concerning
and it is not easy to detect that this event has occurred.

--> The LU mode is meant for scenarios where changes in cache are not meant
to be pushed back to old filesystem.  If thats not whats desired then other
AFM modes like IW can be used to keep namespace in sync and data can flow
from both sides.  Typically, for data migration --metadata-only to pull in
the full namespace first and data can be migrated on demand or via policy
as outlined above using prefetch cmd.  AFM setup should be extension to
GPFS multi-cluster setup when using GPFS backend.

2) The "Progressive migration with no downtime" directions actually states
that there is downtime required to move applications to the new cluster,
THUS DOWNTIME!  And it really requires a SECOND downtime to finally disable
AFM on the file set so that there is no longer a connection to the old file
system, THUS TWO DOWNTIMES!
--> I am not sure I follow the first downtime.  If applications have to
start using the new filesystem, then they have to be informed accordingly.
If this can be done without bringing down applications, then there is no
DOWNTIME.
Regarding, second downtime, you are right, disabling AFM after data
migration requires unlink and hence downtime.  But there is a easy
workaround, where revalidation intervals can be increased to max or GW
nodes can be unconfigured without downtime with same effect.  And disabling
AFM can be done at a later point during maintenance window.  We plan to
modify this to have this done online aka without requiring unlink of the
fileset.  This will get prioritized if there is enough interest in AFM
being used in this direction.

3) The prefetch operation can only run on a single node thus is not able to
take any advantage of the large number of NSD servers supporting both file
systems for the data migration.  Multiple threads from a single node just
doesn't cut it due to single node bandwidth limits.  When I was running the
prefetch it was only executing roughly 100 " Queue numExec" operations per
second.  The prefetch operation for a directory with 12 Million files was
going to take over 33 HOURS just to process the file list!
--> Prefetch can run on multiple nodes by configuring multiple GW nodes and
enabling parallel i/o as specified in the docs..link provided below.
Infact it can parallelize data xfer to a single file and also do multiple
files in parallel depending on filesizes and various tuning params.

4) In comparison, parallel rsync operations will require only ONE downtime
to run a final sync over MULTIPLE nodes in parallel at the time that
applications are migrated between file systems and does not require the
complicated AFM configuration.  Yes, there is of course efforts to breakup
the namespace for each rsync operations.  This is really what AFM should be
doing for us... chopping up the namespace intelligently and spawning
prefetch operations across multiple nodes in a configurable way to ensure
performance is met or limiting overall impact of the operation if desired.

--> AFM can be used for data migration without any downtime dictated by AFM
(see above) and it can infact use multiple threads on multiple nodes to do
parallel i/o.

AFM, however, is great for what it is intended to be, a cached data access
mechanism across a WAN.

Thanks,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda
Sent: Tuesday, October 07, 2014 12:03 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster
environment, slow prefetch operations

Hi Bryan,
 AFM supports GPFS multi-cluster..and we have customers already using this
successfully.  Are you using GPFS backend?
Can you explain your configuration in detail and if ls is hung it would
have generated some long waiters.  Maybe this should be pursued separately
via PMR.  You can ping me the details directly if needed along with opening
a PMR per IBM service process.

As for as prefetch is concerned, right now its limited to  one prefetch job
per fileset.  Each job in itself is multi-threaded and can use multi-nodes
to pull in data based on configuration.
"afmNumFlushThreads" tunable controls the number of threads used by AFM.
This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't
show this param for some reason, I will have that updated.)

eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW
changed.

List the change:
 mmlsfileset fs1 prefetchIW --afm -L
Filesets in file system 'fs1':

Attributes for fileset prefetchIW:
===================================
Status                                  Linked
Path                                    /gpfs/fs1/prefetchIW
Id                                      36
afm-associated                          Yes
Target
nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch
Mode                                    independent-writer
File Lookup Refresh Interval            30 (default)
File Open Refresh Interval              30 (default)
Dir Lookup Refresh Interval             60 (default)
Dir Open Refresh Interval               60 (default)
Async Delay                             15 (default)
Last pSnapId                            0
Display Home Snapshots                  no
Number of Gateway Flush Threads         5
Prefetch Threshold                      0 (default)
Eviction Enabled                        yes (default)

AFM parallel i/o can be setup such that multiple GW nodes can be used to
pull in data..more details are available here
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm


and this link outlines tuning params for parallel i/o along with others:
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning


Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:   Bryan Banister <bbanister at jumptrading.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/06/2014 09:57 PM
Subject:        Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:        gpfsug-discuss-bounces at gpfsug.org


We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan

From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster
environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of
file data migration between two GPFS file systems.  The two GPFS file
systems are managed in two separate GPFS clusters.  We have a third GPFS
cluster for compute systems.  We created new independent AFM filesets in
the new GPFS file system that are linked to directories in the old file
system.  Unfortunately access to the AFM filesets from the compute cluster
completely hang.  Access to the other parts of the second file system is
fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command,
but the process appears to be single threaded and the operation was
extremely slow as a result.  According to the Advanced Admin Guide, it is
not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j
FilesetName command (which specifies a list of files to prefetch). Note the
following about prefetching:
v It can be run in parallel on multiple filesets (although more than one
prefetching job cannot be run in parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file
system using the mmapplypolicy command as the documentation describes.
However the AFM prefetch operation is so slow that we are better off
running parallel rsync operations between the file systems versus using the
GPFS AFM prefetch operation.

Cheers,
-Bryan


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From sdinardo at ebi.ac.uk  Thu Oct  9 13:02:44 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 09 Oct 2014 13:02:44 +0100
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
Message-ID: <54367964.1050900@ebi.ac.uk>

Hello everyone,

Suppose we want to build a new GPFS storage using SAN attached storages, 
but instead to put metadata in a shared storage, we want to use  
FusionIO PCI cards locally on the servers to speed up metadata 
operation( http://www.fusionio.com/products/iodrive) and for 
reliability, replicate the metadata in all the servers, will this work 
in case of  server failure?

To make it more clear: If a server fail i will loose also a metadata 
vdisk. Its the replica mechanism its reliable enough to avoid metadata 
corruption and loss of data?

Thanks in advance
Salvatore Di Nardo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141009/010abbe4/attachment-0002.htm>

From bbanister at jumptrading.com  Thu Oct  9 20:31:28 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 9 Oct 2014 19:31:28 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>

Just wanted to pass my GPFS RFE along:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458


Description:

GPFS File System Manager should provide the option to log all file and directory operations that occur in a file system, preferably stored in a TSD (Time Series Database) that could be quickly queried through an API interface and command line tools.    This would allow many required file system management operations to obtain the change log of a file system namespace without having to use the GPFS ILM policy engine to search all file system metadata for changes, and would not need to run massive differential comparisons of file system namespace snapshots to determine what files have been modified, deleted, added, etc.

It would be doubly great if this could be controlled on a per-fileset bases.


Use case:

This could be used for a very large number of file system management applications, including:
1) SOBAR (Scale-Out Backup And Restore)
2) Data Security Auditing and Monitoring applications
3) Async Replication of namespace between GPFS file systems without the requirement of AFM, which must use ILM policies that add unnecessary workload to metadata resources.
4) Application file system access profiling

Please vote for it if you feel it would also benefit your operation, thanks,
-Bryan


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141009/430cce16/attachment-0002.htm>

From service at metamodul.com  Fri Oct 10 13:21:43 2014
From: service at metamodul.com (service at metamodul.com)
Date: Fri, 10 Oct 2014 14:21:43 +0200 (CEST)
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <937639307.291563.1412943703119.JavaMail.open-xchange@oxbaltgw12.schlund.de>

 
> Bryan Banister <bbanister at jumptrading.com> hat am 9. Oktober 2014 um 21:31
> geschrieben:
> 
> 
>  Just wanted to pass my GPFS RFE along:
> 
>  http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
> <http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458>
> 

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."

Cheers
Hajo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/5ad0dff4/attachment-0002.htm>

From pgp at psu.edu  Fri Oct 10 16:04:02 2014
From: pgp at psu.edu (Phil Pishioneri)
Date: Fri, 10 Oct 2014 11:04:02 -0400
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <5437F562.1080609@psu.edu>

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and 
> directory operations that occur in a file system, preferably stored in 
> a TSD (Time Series Database) that could be quickly queried through an 
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS 
(used by the TSM HSM product). A while ago this was posted to the IBM 
GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil


From bbanister at jumptrading.com  Fri Oct 10 16:08:04 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 10 Oct 2014 15:08:04 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <5437F562.1080609@psu.edu>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>

Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


From bdeluca at gmail.com  Fri Oct 10 16:26:40 2014
From: bdeluca at gmail.com (Ben De Luca)
Date: Fri, 10 Oct 2014 23:26:40 +0800
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>

Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com>
wrote:

> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
> option.  Has anybody done this already and are there any examples we could
> look at?
>
> Thanks!
> -Bryan
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
> Sent: Friday, October 10, 2014 10:04 AM
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>
> On 10/9/14 3:31 PM, Bryan Banister wrote:
> >
> > Just wanted to pass my GPFS RFE along:
> >
> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> > 0458
> >
> >
> > *Description*:
> >
> > GPFS File System Manager should provide the option to log all file and
> > directory operations that occur in a file system, preferably stored in
> > a TSD (Time Series Database) that could be quickly queried through an
> > API interface and command line tools.  ...
> >
>
> The rudimentaries for this already exist via the DMAPI interface in GPFS
> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
> DeveloperWorks forum:
>
> On 1/3/11 10:27 AM, dWForums wrote:
> > Author:
> > AlokK.Dhir
> >
> > Message:
> > We have a proof of concept which uses DMAPI to listens to and passively
> logs filesystem changes with a non blocking listener.  This log can be used
> to generate backup sets etc.  Unfortunately, a bug in the current DMAPI
> keeps this approach from working in the case of certain events.  I am told
> 3.4.0.3 may contain a fix.  We will gladly share the code once it is
> working.
>
> -Phil
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/d32dbb50/attachment-0002.htm>

From oehmes at gmail.com  Fri Oct 10 16:51:51 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Fri, 10 Oct 2014 08:51:51 -0700
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
Message-ID: <CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>

Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is
here :
https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com> wrote:

> Id like this to see hot files
>
> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <
> bbanister at jumptrading.com> wrote:
>
>> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
>> option.  Has anybody done this already and are there any examples we could
>> look at?
>>
>> Thanks!
>> -Bryan
>>
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
>> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
>> Sent: Friday, October 10, 2014 10:04 AM
>> To: gpfsug main discussion list
>> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>>
>> On 10/9/14 3:31 PM, Bryan Banister wrote:
>> >
>> > Just wanted to pass my GPFS RFE along:
>> >
>> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
>> > 0458
>> >
>> >
>> > *Description*:
>> >
>> > GPFS File System Manager should provide the option to log all file and
>> > directory operations that occur in a file system, preferably stored in
>> > a TSD (Time Series Database) that could be quickly queried through an
>> > API interface and command line tools.  ...
>> >
>>
>> The rudimentaries for this already exist via the DMAPI interface in GPFS
>> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
>> DeveloperWorks forum:
>>
>> On 1/3/11 10:27 AM, dWForums wrote:
>> > Author:
>> > AlokK.Dhir
>> >
>> > Message:
>> > We have a proof of concept which uses DMAPI to listens to and passively
>> logs filesystem changes with a non blocking listener.  This log can be used
>> to generate backup sets etc.  Unfortunately, a bug in the current DMAPI
>> keeps this approach from working in the case of certain events.  I am told
>> 3.4.0.3 may contain a fix.  We will gladly share the code once it is
>> working.
>>
>> -Phil
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> ________________________________
>>
>> Note: This email is for the confidential use of the named addressee(s)
>> only and may contain proprietary, confidential or privileged information.
>> If you are not the intended recipient, you are hereby notified that any
>> review, dissemination or copying of this email is strictly prohibited, and
>> to please notify the sender immediately and destroy this email and any
>> attachments. Email transmission cannot be guaranteed to be secure or
>> error-free. The Company, therefore, does not make any guarantees as to the
>> completeness or accuracy of this email or any attachments. This email is
>> for informational purposes only and does not constitute a recommendation,
>> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
>> or perform any type of transaction of a financial product.
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/4ca468f9/attachment-0002.htm>

From Paul.Sanchez at deshaw.com  Fri Oct 10 17:02:09 2014
From: Paul.Sanchez at deshaw.com (Sanchez, Paul)
Date: Fri, 10 Oct 2014 16:02:09 +0000
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
In-Reply-To: <54367964.1050900@ebi.ac.uk>
References: <54367964.1050900@ebi.ac.uk>
Message-ID: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>

Hi Salvatore,

We've done this before (non-shared metadata NSDs with GPFS 4.1) and noted these constraints:

* Filesystem descriptor quorum: since it will be easier to have a metadata disk go offline, it's even more important to have three failure groups with FusionIO metadata NSDs in two, and at least a desc_only NSD in the third one. You may even want to explore having three full metadata replicas on FusionIO. (Or perhaps if your workload can tolerate it the third one can be slower but in another GPFS "subnet" so that it isn't used for reads.)

* Make sure to set the correct default metadata replicas in your filesystem, corresponding to the number of metadata failure groups you set up. When a metadata server goes offline, it will take the metadata disks with it, and you want a replica of the metadata to be available.

* When a metadata server goes offline and comes back up (after a maintenance reboot, for example), the non-shared metadata disks will be stopped. Until those are brought back into a  well-known replicated state, you are at risk of a cluster-wide filesystem unmount if there is a subsequent metadata disk failure. But GPFS will continue to work, by default, allowing reads and writes against the remaining metadata replica. You must detect that disks are stopped (e.g. mmlsdisk) and restart them (e.g. with mmchdisk <fs> start ?a).

I haven't seen anyone "recommend" running non-shared disk like this, and I wouldn't do this for things which can't afford to go offline unexpectedly and require a little more operational attention. But it does appear to work.

Thx
Paul Sanchez


From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Salvatore Di Nardo
Sent: Thursday, October 09, 2014 8:03 AM
To: gpfsug main discussion list
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?

Hello everyone,

Suppose we want to build a new GPFS storage using SAN attached storages, but instead to put metadata in a shared storage, we want to use  FusionIO PCI cards locally on the servers to speed up metadata operation( http://www.fusionio.com/products/iodrive) and for reliability, replicate the metadata in all the servers, will this work in case of  server failure?

To make it more clear: If a server fail i will loose also a metadata vdisk. Its the replica mechanism its reliable enough to avoid metadata corruption and loss of data?

Thanks in advance
Salvatore Di Nardo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/f4da20dc/attachment-0002.htm>

From oester at gmail.com  Fri Oct 10 17:05:03 2014
From: oester at gmail.com (Bob Oesterlin)
Date: Fri, 10 Oct 2014 11:05:03 -0500
Subject: [gpfsug-discuss] GPFS File Heat
Message-ID: <CAMNdFvD5kqP7pzR3gL7Os3wo5Q9maRHCrYRSetEPYAggzGTXzA@mail.gmail.com>

As Sven suggests, this is easy to gather once you turn on file heat. I run
this heat.pol file against a file systems to gather the values:

-- heat.pol --

define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1)
END])

rule fh1 external list 'fh' exec ''
rule fh2 list 'fh' weight(FILE_HEAT) show( DISPLAY_NULL(FILE_HEAT) || '|'
|| varchar(file_size) )

-- heat.pol --

Produces output similar to this:

/gpfs/.../specFile.pyc 535089836 5892
/gpfs/.../syspath.py 528685287 806
/gpfs/---/bwe.py 528160670 4607

Actual GPFS file path redacted :)

After that it's a relatively straightforward process to go thru the values.
There is no documentation on what the values really mean, but it does give
you some overall indication of which files are getting the most hits.

I have other information to share; drop me a note at my work email:

robert.oesterlin at nuance.com

Bob Oesterlin
Sr Storage Engineer, Nuance Communications
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/6cea1102/attachment-0002.htm>

From bdeluca at gmail.com  Fri Oct 10 17:09:49 2014
From: bdeluca at gmail.com (Ben De Luca)
Date: Sat, 11 Oct 2014 00:09:49 +0800
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
Message-ID: <CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>

querying this through the policy engine is far to late to do any thing
useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com> wrote:

> Ben,
>
> to get lists of 'Hot Files' turn File Heat on , some discussion about it
> is here :
> https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653
>
> thx.  Sven
>
>
> On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com> wrote:
>
>> Id like this to see hot files
>>
>> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <
>> bbanister at jumptrading.com> wrote:
>>
>>> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
>>> option.  Has anybody done this already and are there any examples we could
>>> look at?
>>>
>>> Thanks!
>>> -Bryan
>>>
>>> -----Original Message-----
>>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
>>> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
>>> Sent: Friday, October 10, 2014 10:04 AM
>>> To: gpfsug main discussion list
>>> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>>>
>>> On 10/9/14 3:31 PM, Bryan Banister wrote:
>>> >
>>> > Just wanted to pass my GPFS RFE along:
>>> >
>>> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
>>> > 0458
>>> >
>>> >
>>> > *Description*:
>>> >
>>> > GPFS File System Manager should provide the option to log all file and
>>> > directory operations that occur in a file system, preferably stored in
>>> > a TSD (Time Series Database) that could be quickly queried through an
>>> > API interface and command line tools.  ...
>>> >
>>>
>>> The rudimentaries for this already exist via the DMAPI interface in GPFS
>>> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
>>> DeveloperWorks forum:
>>>
>>> On 1/3/11 10:27 AM, dWForums wrote:
>>> > Author:
>>> > AlokK.Dhir
>>> >
>>> > Message:
>>> > We have a proof of concept which uses DMAPI to listens to and
>>> passively logs filesystem changes with a non blocking listener.  This log
>>> can be used to generate backup sets etc.  Unfortunately, a bug in the
>>> current DMAPI keeps this approach from working in the case of certain
>>> events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the
>>> code once it is working.
>>>
>>> -Phil
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>> ________________________________
>>>
>>> Note: This email is for the confidential use of the named addressee(s)
>>> only and may contain proprietary, confidential or privileged information.
>>> If you are not the intended recipient, you are hereby notified that any
>>> review, dissemination or copying of this email is strictly prohibited, and
>>> to please notify the sender immediately and destroy this email and any
>>> attachments. Email transmission cannot be guaranteed to be secure or
>>> error-free. The Company, therefore, does not make any guarantees as to the
>>> completeness or accuracy of this email or any attachments. This email is
>>> for informational purposes only and does not constitute a recommendation,
>>> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
>>> or perform any type of transaction of a financial product.
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141011/982198d6/attachment-0002.htm>

From bbanister at jumptrading.com  Fri Oct 10 17:15:22 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 10 Oct 2014 16:15:22 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>

I agree with Ben, I think.

I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/3e5ecf5a/attachment-0002.htm>

From Paul.Sanchez at deshaw.com  Fri Oct 10 17:24:32 2014
From: Paul.Sanchez at deshaw.com (Sanchez, Paul)
Date: Fri, 10 Oct 2014 16:24:32 +0000
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
References: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
Message-ID: <201D6001C896B846A9CFC2E841986AC1451878D2@mailnycmb2a.winmail.deshaw.com>

We've been mounting all filesystems in a canonical location and bind mounting filesets into the namespace.  

One gotcha that we recently encountered though was the selection of /gpfs as the root of the canonical mount path.  (By default automountdir is set to /gpfs/automountdir, which made this seem like a good spot.)  This seems to be where gpfs expects filesystems to be mounted, since there are some hardcoded references in the gpfs.base RPM %pre script (RHEL package for GPFS) which try to nudge processes off of the filesystems before yanking the mounts during an RPM version upgrade.  This however may take an exceedingly long time, since it's doing an 'lsof +D /gpfs' which walks the filesystems.

-Paul Sanchez

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
Sent: Tuesday, September 23, 2014 11:47 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] filesets and mountpoint naming

When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you do differently?

Thanks,
Stuart
--
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oehmes at gmail.com  Fri Oct 10 17:52:27 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Fri, 10 Oct 2014 09:52:27 -0700
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>

The only DMAPI agent i am aware of is a prototype that was written by
tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support
the DMAPI interface , but the code that exposes the API is nothing we
provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:

>  I agree with Ben, I think.
>
>
>
> I don?t want to use the ILM policy engine as that puts a direct workload
> against the metadata storage and server resources.  We need something
> out-of-band, out of the file system operational path.
>
>
>
> Is there a simple DMAPI daemon that would log the file system namespace
> changes that we could use?
>
>
>
> If so are there any limitations?
>
>
>
> And is it possible to set this up in an HA environment?
>
>
>
> Thanks!
>
> -Bryan
>
>
>
> *From:* gpfsug-discuss-bounces at gpfsug.org [mailto:
> gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Ben De Luca
> *Sent:* Friday, October 10, 2014 11:10 AM
>
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
>
>
> querying this through the policy engine is far to late to do any thing
> useful with it
>
>
>
> On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com> wrote:
>
> Ben,
>
>
>
> to get lists of 'Hot Files' turn File Heat on , some discussion about it
> is here :
> https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653
>
>
>
> thx.  Sven
>
>
>
>
>
> On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com> wrote:
>
> Id like this to see hot files
>
>
>
> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <
> bbanister at jumptrading.com> wrote:
>
> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
> option.  Has anybody done this already and are there any examples we could
> look at?
>
> Thanks!
> -Bryan
>
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
> Sent: Friday, October 10, 2014 10:04 AM
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>
> On 10/9/14 3:31 PM, Bryan Banister wrote:
> >
> > Just wanted to pass my GPFS RFE along:
> >
> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> > 0458
> >
> >
> > *Description*:
> >
> > GPFS File System Manager should provide the option to log all file and
> > directory operations that occur in a file system, preferably stored in
> > a TSD (Time Series Database) that could be quickly queried through an
> > API interface and command line tools.  ...
> >
>
> The rudimentaries for this already exist via the DMAPI interface in GPFS
> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
> DeveloperWorks forum:
>
> On 1/3/11 10:27 AM, dWForums wrote:
> > Author:
> > AlokK.Dhir
> >
> > Message:
> > We have a proof of concept which uses DMAPI to listens to and passively
> logs filesystem changes with a non blocking listener.  This log can be used
> to generate backup sets etc.  Unfortunately, a bug in the current DMAPI
> keeps this approach from working in the case of certain events.  I am told
> 3.4.0.3 may contain a fix.  We will gladly share the code once it is
> working.
>
> -Phil
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> ------------------------------
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/0961d1f4/attachment-0002.htm>

From bbanister at jumptrading.com  Fri Oct 10 18:13:16 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 10 Oct 2014 17:13:16 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>

A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted.  I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable.  I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!)

Thanks,
-Bryan

Ps. Hajo said that he couldn?t access the RFE to vote on it:

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."
Cheers
Hajo

Here is what the RFE website states:
Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
A unique URL that you can bookmark and share with others.

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Friday, October 10, 2014 11:52 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
I agree with Ben, I think.

I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM

To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/e60c8dfc/attachment-0002.htm>

From sdinardo at ebi.ac.uk  Sat Oct 11 10:37:10 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Sat, 11 Oct 2014 10:37:10 +0100
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
References: <54367964.1050900@ebi.ac.uk>
	<201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
Message-ID: <5438FA46.7090902@ebi.ac.uk>

Thanks for your answer.
Yes, the idea is to have 3 servers in 3 different failure groups. Each 
of them with a  drive and set 3 metadata replica as the default one.

I have not considered that the vdisks could be off after a 'reboot' or 
failure, so that's a good point, but anyway , after a failure or even a 
standard reboot, the server and the cluster have to be checked anyway, 
and i always check the vdisk status, so no big deal.

Your answer made me consider also another thing...  Once put them back 
online, they will be restriped automatically or should i run every time  
'mmrestripefs' to verify/correct the replicas?

I understand that use lodal disk sound strange, infact our first idea 
was just to add some ssd to the shared storage, but then we considered 
that the sas cable could be a huge bottleneck. The cost difference is 
not huge and the fusioio locally on the server would make the metadata 
just fly.


On 10/10/14 17:02, Sanchez, Paul wrote:
>
> Hi Salvatore,
>
> We've done this before (non-shared metadata NSDs with GPFS 4.1) and 
> noted these constraints:
>
> * Filesystem descriptor quorum: since it will be easier to have a 
> metadata disk go offline, it's even more important to have three 
> failure groups with FusionIO metadata NSDs in two, and at least a 
> desc_only NSD in the third one. You may even want to explore having 
> three full metadata replicas on FusionIO. (Or perhaps if your workload 
> can tolerate it the third one can be slower but in another GPFS 
> "subnet" so that it isn't used for reads.)
>
> * Make sure to set the correct default metadata replicas in your 
> filesystem, corresponding to the number of metadata failure groups you 
> set up. When a metadata server goes offline, it will take the metadata 
> disks with it, and you want a replica of the metadata to be available.
>
> * When a metadata server goes offline and comes back up (after a 
> maintenance reboot, for example), the non-shared metadata disks will 
> be stopped. Until those are brought back into a  well-known replicated 
> state, you are at risk of a cluster-wide filesystem unmount if there 
> is a subsequent metadata disk failure. But GPFS will continue to work, 
> by default, allowing reads and writes against the remaining metadata 
> replica. You must detect that disks are stopped (e.g. mmlsdisk) and 
> restart them (e.g. with mmchdisk <fs> start ?a).
>
> I haven't seen anyone "recommend" running non-shared disk like this, 
> and I wouldn't do this for things which can't afford to go offline 
> unexpectedly and require a little more operational attention. But it 
> does appear to work.
>
> Thx
> Paul Sanchez
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di 
> Nardo
> *Sent:* Thursday, October 09, 2014 8:03 AM
> *To:* gpfsug main discussion list
> *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable?
>
> Hello everyone,
>
> Suppose we want to build a new GPFS storage using SAN attached 
> storages, but instead to put metadata in a shared storage, we want to 
> use  FusionIO PCI cards locally on the servers to speed up metadata 
> operation( http://www.fusionio.com/products/iodrive) and for 
> reliability, replicate the metadata in all the servers, will this work 
> in case of  server failure?
>
> To make it more clear: If a server fail i will loose also a metadata 
> vdisk. Its the replica mechanism its reliable enough to avoid metadata 
> corruption and loss of data?
>
> Thanks in advance
> Salvatore Di Nardo
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141011/f580507a/attachment-0002.htm>

From service at metamodul.com  Sun Oct 12 17:03:56 2014
From: service at metamodul.com (MetaService)
Date: Sun, 12 Oct 2014 18:03:56 +0200
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
References: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
Message-ID: <1413129836.4846.9.camel@titan>

My preferred naming convention is to use the cluster name or part of it
as the base directory for all GPFS mounts.

Example: Clustername=c1_eum would mean that:

/c1_eum/

would be the base directory for all Cluster c1_eum GPFSs

In case a second local cluster would exist its root mount point would
be /c2_eum/

Even in case of mounting remote clusters a naming collision is not very
likely.

BTW: For accessing the the final directories /.../scratch ... the user
should not rely on the mount points but on given variables provided.

CLS_HOME=/...
CLS_SCRATCH=/....

hth
Hajo


From lhorrocks-barlow at ocf.co.uk  Fri Oct 10 17:48:24 2014
From: lhorrocks-barlow at ocf.co.uk (Laurence Horrocks- Barlow)
Date: Fri, 10 Oct 2014 17:48:24 +0100
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
References: <54367964.1050900@ebi.ac.uk>
	<201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
Message-ID: <54380DD8.2020909@ocf.co.uk>

Hi Salvatore,

Just to add that when the local metadata disk fails or the server goes 
offline there will most likely be an I/O interruption/pause whist the 
GPFS cluster renegotiates.

The main concept to be aware of (as Paul mentioned) is that when a disk 
goes offline it will appear down to GPFS, once you've started the disk 
again it will rediscover and scan the metadata for any missing updates, 
these updates are then repaired/replicated again.

Laurence Horrocks-Barlow
Linux Systems Software Engineer
OCF plc

Tel: +44 (0)114 257 2200
Fax: +44 (0)114 257 0022
Web: www.ocf.co.uk <http://www.ocf.co.uk>
Blog: blog.ocf.co.uk <http://blog.ocf.co.uk>
Twitter: @ocfplc <http://twitter.com/#%21/ocfplc>

OCF plc is a company registered in England and Wales. Registered number 
4132533, VAT number GB 780 6803 14. Registered office address: OCF plc, 
5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 
2PG.

This message is private and confidential. If you have received this 
message in error, please notify us and remove it from your system.


On 10/10/2014 17:02, Sanchez, Paul wrote:
>
> Hi Salvatore,
>
> We've done this before (non-shared metadata NSDs with GPFS 4.1) and 
> noted these constraints:
>
> * Filesystem descriptor quorum: since it will be easier to have a 
> metadata disk go offline, it's even more important to have three 
> failure groups with FusionIO metadata NSDs in two, and at least a 
> desc_only NSD in the third one. You may even want to explore having 
> three full metadata replicas on FusionIO. (Or perhaps if your workload 
> can tolerate it the third one can be slower but in another GPFS 
> "subnet" so that it isn't used for reads.)
>
> * Make sure to set the correct default metadata replicas in your 
> filesystem, corresponding to the number of metadata failure groups you 
> set up. When a metadata server goes offline, it will take the metadata 
> disks with it, and you want a replica of the metadata to be available.
>
> * When a metadata server goes offline and comes back up (after a 
> maintenance reboot, for example), the non-shared metadata disks will 
> be stopped. Until those are brought back into a  well-known replicated 
> state, you are at risk of a cluster-wide filesystem unmount if there 
> is a subsequent metadata disk failure. But GPFS will continue to work, 
> by default, allowing reads and writes against the remaining metadata 
> replica. You must detect that disks are stopped (e.g. mmlsdisk) and 
> restart them (e.g. with mmchdisk <fs> start ?a).
>
> I haven't seen anyone "recommend" running non-shared disk like this, 
> and I wouldn't do this for things which can't afford to go offline 
> unexpectedly and require a little more operational attention. But it 
> does appear to work.
>
> Thx
> Paul Sanchez
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di 
> Nardo
> *Sent:* Thursday, October 09, 2014 8:03 AM
> *To:* gpfsug main discussion list
> *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable?
>
> Hello everyone,
>
> Suppose we want to build a new GPFS storage using SAN attached 
> storages, but instead to put metadata in a shared storage, we want to 
> use  FusionIO PCI cards locally on the servers to speed up metadata 
> operation( http://www.fusionio.com/products/iodrive) and for 
> reliability, replicate the metadata in all the servers, will this work 
> in case of  server failure?
>
> To make it more clear: If a server fail i will loose also a metadata 
> vdisk. Its the replica mechanism its reliable enough to avoid metadata 
> corruption and loss of data?
>
> Thanks in advance
> Salvatore Di Nardo
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/8d4ab475/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lhorrocks-barlow.vcf
Type: text/x-vcard
Size: 388 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/8d4ab475/attachment-0002.vcf>

From kraemerf at de.ibm.com  Mon Oct 13 12:10:17 2014
From: kraemerf at de.ibm.com (Frank Kraemer)
Date: Mon, 13 Oct 2014 13:10:17 +0200
Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014,
	Duesseldorf, Germany
Message-ID: <OF53876A5D.2ACC8EFA-ONC1257D70.003D5DC4-C1257D70.003D5DC7@de.ibm.com>


GPFS at  LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany
Oct 14th 11:15-12:05 Room 18
http://sched.co/1uMYEWK

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Hechtsheimer Str. 2, 55131 Mainz
mailto:kraemerf at de.ibm.com
voice: +49171-3043699
IBM Germany


From service at metamodul.com  Mon Oct 13 16:49:44 2014
From: service at metamodul.com (service at metamodul.com)
Date: Mon, 13 Oct 2014 17:49:44 +0200 (CEST)
Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014,
 Duesseldorf, Germany
In-Reply-To: <OF53876A5D.2ACC8EFA-ONC1257D70.003D5DC4-C1257D70.003D5DC7@de.ibm.com>
References: <OF53876A5D.2ACC8EFA-ONC1257D70.003D5DC4-C1257D70.003D5DC7@de.ibm.com>
Message-ID: <994787708.574787.1413215384447.JavaMail.open-xchange@oxbaltgw12.schlund.de>

Hallo Frank,
the announcement is a little bit to late for me. Would be nice if you could
share your speech later.
 
cheers
Hajo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141013/cf4b67b2/attachment-0002.htm>

From sdinardo at ebi.ac.uk  Tue Oct 14 15:39:35 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 14 Oct 2014 15:39:35 +0100
Subject: [gpfsug-discuss] wait for permission to append to log
Message-ID: <543D35A7.7080800@ebi.ac.uk>

hello all,
could someone explain me the meaning of those waiters?

gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'

Does it means that the vdisk logs are struggling?

Regards,
Salvatore


From oehmes at us.ibm.com  Tue Oct 14 15:51:10 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 07:51:10 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <543D35A7.7080800@ebi.ac.uk>
References: <543D35A7.7080800@ebi.ac.uk>
Message-ID: <OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>

it means there is contention on inserting data into the fast write log on 
the GSS Node, which could be config or workload related
what GSS code version are you running and how are the nodes connected with 
each other (Ethernet or IB) ? 

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Salvatore Di Nardo <sdinardo at ebi.ac.uk>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/14/2014 07:40 AM
Subject:        [gpfsug-discuss] wait for permission to append to log
Sent by:        gpfsug-discuss-bounces at gpfsug.org


hello all,
could someone explain me the meaning of those waiters?

gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'

Does it means that the vdisk logs are struggling?

Regards,
Salvatore

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/63d72890/attachment-0002.htm>

From sdinardo at ebi.ac.uk  Tue Oct 14 16:23:01 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 14 Oct 2014 16:23:01 +0100
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
References: <543D35A7.7080800@ebi.ac.uk>
	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
Message-ID: <543D3FD5.1060705@ebi.ac.uk>


On 14/10/14 15:51, Sven Oehme wrote:
> it means there is contention on inserting data into the fast write log 
> on the GSS Node, which could be config or workload related
> what GSS code version are you running 

            [root at ebi5-251 ~]# mmdiag --version

            === mmdiag: version ===
            Current GPFS build: "3.5.0-11 efix1 (888041)".
            Built on Jul  9 2013 at 18:03:32
            Running 6 days 2 hours 10 minutes 35 secs


> and how are the nodes connected with each other (Ethernet or IB) ?
ethernet. they use the same bonding (4x10Gb/s) where the data is 
passing. We don't have admin dedicated network


            [root at gss03a ~]# mmlscluster

            GPFS cluster information
            ========================
               GPFS cluster name:         GSS.ebi.ac.uk
               GPFS cluster id:           17987981184946329605
               GPFS UID domain:           GSS.ebi.ac.uk
               Remote shell command:      /usr/bin/ssh
               Remote file copy command:  /usr/bin/scp

            GPFS cluster configuration servers:
            -----------------------------------
               Primary server:    gss01a.ebi.ac.uk
               Secondary server:  gss02b.ebi.ac.uk

              Node  Daemon node name    IP address  Admin node name
            Designation
            -----------------------------------------------------------------------
                1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk
            quorum-manager
                2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk
            quorum-manager
                3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk
            quorum-manager
                4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk
            quorum-manager
                5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk
            quorum-manager
                6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk
            quorum-manager


*Note:* The 3 node "pairs" (gss01, gss02 and gss03)  are in different 
subnet because of datacenter constraints ( They are not physically in 
the same row, and due to network constraints was not possible to put 
them in the same subnet). The packets are routed, but should not be a 
problem as there is 160Gb/s bandwidth between them.

Regards,
Salvatore


> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
>
>
> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
> Date: 10/14/2014 07:40 AM
> Subject: [gpfsug-discuss] wait for permission to append to log
> Sent by: gpfsug-discuss-bounces at gpfsug.org
> ------------------------------------------------------------------------
>
>
>
> hello all,
> could someone explain me the meaning of those waiters?
>
> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>
> Does it means that the vdisk logs are struggling?
>
> Regards,
> Salvatore
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/36a5bc7e/attachment-0002.htm>

From oehmes at us.ibm.com  Tue Oct 14 17:22:41 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 09:22:41 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <543D3FD5.1060705@ebi.ac.uk>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>
Message-ID: <OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>

your GSS code version is very backlevel. 

can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk 
as well as mmlsconfig and mmlsfs all

thx. Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Salvatore Di Nardo <sdinardo at ebi.ac.uk>
To:     gpfsug-discuss at gpfsug.org
Date:   10/14/2014 08:23 AM
Subject:        Re: [gpfsug-discuss] wait for permission to append to log
Sent by:        gpfsug-discuss-bounces at gpfsug.org


On 14/10/14 15:51, Sven Oehme wrote:
it means there is contention on inserting data into the fast write log on 
the GSS Node, which could be config or workload related 
what GSS code version are you running 
[root at ebi5-251 ~]# mmdiag --version

=== mmdiag: version ===
Current GPFS build: "3.5.0-11 efix1 (888041)".
Built on Jul  9 2013 at 18:03:32
Running 6 days 2 hours 10 minutes 35 secs


and how are the nodes connected with each other (Ethernet or IB) ? 
ethernet. they use the same bonding (4x10Gb/s) where the data is passing. 
We don't have admin dedicated network

[root at gss03a ~]# mmlscluster 

GPFS cluster information
========================
  GPFS cluster name:         GSS.ebi.ac.uk
  GPFS cluster id:           17987981184946329605
  GPFS UID domain:           GSS.ebi.ac.uk
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp

GPFS cluster configuration servers:
-----------------------------------
  Primary server:    gss01a.ebi.ac.uk
  Secondary server:  gss02b.ebi.ac.uk

 Node  Daemon node name    IP address  Admin node name     Designation
-----------------------------------------------------------------------
   1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk    quorum-manager
   2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk    quorum-manager
   3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk    quorum-manager
   4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk    quorum-manager
   5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk    quorum-manager
   6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk    quorum-manager


Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different subnet 
because of datacenter constraints ( They are not physically in the same 
row, and due to network constraints was not possible to put them in the 
same subnet). The packets are routed, but should not be a problem as there 
is 160Gb/s bandwidth between them.

Regards,
Salvatore


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/14/2014 07:40 AM 
Subject:        [gpfsug-discuss] wait for permission to append to log 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


hello all,
could someone explain me the meaning of those waiters?

gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'

Does it means that the vdisk logs are struggling?

Regards,
Salvatore

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/a578f87a/attachment-0002.htm>

From sdinardo at ebi.ac.uk  Tue Oct 14 17:39:18 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 14 Oct 2014 17:39:18 +0100
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>	<543D3FD5.1060705@ebi.ac.uk>
	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
Message-ID: <543D51B6.3070602@ebi.ac.uk>

Thanks in advance for your help.

We have 6 RG:

              recovery group        vdisks     vdisks servers
              ------------------  -----------  ------  -------
              gss01a                        4       8
            gss01a.ebi.ac.uk,gss01b.ebi.ac.uk
              gss01b                        4       8
            gss01b.ebi.ac.uk,gss01a.ebi.ac.uk
              gss02a                        4       8
            gss02a.ebi.ac.uk,gss02b.ebi.ac.uk
              gss02b                        4       8
            gss02b.ebi.ac.uk,gss02a.ebi.ac.uk
              gss03a                        4       8
            gss03a.ebi.ac.uk,gss03b.ebi.ac.uk
              gss03b                        4       8
            gss03b.ebi.ac.uk,gss03a.ebi.ac.uk


Check the attached file for RG details.
Following mmlsconfig:

            [root at gss01a ~]# mmlsconfig
            Configuration data for cluster GSS.ebi.ac.uk:
            ---------------------------------------------
            myNodeConfigNumber 1
            clusterName GSS.ebi.ac.uk
            clusterId 17987981184946329605
            autoload no
            dmapiFileHandleSize 32
            minReleaseLevel 3.5.0.11
            [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
            pagepool 38g
            nsdRAIDBufferPoolSizePct 80
            maxBufferDescs 2m
            numaMemoryInterleave yes
            prefetchPct 5
            maxblocksize 16m
            nsdRAIDTracks 128k
            ioHistorySize 64k
            nsdRAIDSmallBufferSize 256k
            nsdMaxWorkerThreads 3k
            nsdMinWorkerThreads 3k
            nsdRAIDSmallThreadRatio 2
            nsdRAIDThreadsPerQueue 16
            nsdClientCksumTypeLocal ck64
            nsdClientCksumTypeRemote ck64
            nsdRAIDEventLogToConsole all
            nsdRAIDFastWriteFSDataLimit 64k
            nsdRAIDFastWriteFSMetadataLimit 256k
            nsdRAIDReconstructAggressiveness 1
            nsdRAIDFlusherBuffersLowWatermarkPct 20
            nsdRAIDFlusherBuffersLimitPct 80
            nsdRAIDFlusherTracksLowWatermarkPct 20
            nsdRAIDFlusherTracksLimitPct 80
            nsdRAIDFlusherFWLogHighWatermarkMB 1000
            nsdRAIDFlusherFWLogLimitMB 5000
            nsdRAIDFlusherThreadsLowWatermark 1
            nsdRAIDFlusherThreadsHighWatermark 512
            nsdRAIDBlockDeviceMaxSectorsKB 4096
            nsdRAIDBlockDeviceNrRequests 32
            nsdRAIDBlockDeviceQueueDepth 16
            nsdRAIDBlockDeviceScheduler deadline
            nsdRAIDMaxTransientStale2FT 1
            nsdRAIDMaxTransientStale3FT 1
            syncWorkerThreads 256
            tscWorkerPool 64
            nsdInlineWriteMax 32k
            maxFilesToCache 12k
            maxStatCache 512
            maxGeneralThreads 1280
            flushedDataTarget 1024
            flushedInodeTarget 1024
            maxFileCleaners 1024
            maxBufferCleaners 1024
            logBufferCount 20
            logWrapAmountPct 2
            logWrapThreads 128
            maxAllocRegionsPerNode 32
            maxBackgroundDeletionThreads 16
            maxInodeDeallocPrefetch 128
            maxMBpS 16000
            maxReceiverThreads 128
            worker1Threads 1024
            worker3Threads 32
            [common]
            cipherList AUTHONLY
            socketMaxListenConnections 1500
            failureDetectionTime 60
            [common]
            adminMode central

            File systems in cluster GSS.ebi.ac.uk:
            --------------------------------------
            /dev/gpfs1

For more configuration paramenters i also attached a file with the 
complete output of mmdiag --config.


and mmlsfs:


            File system attributes for /dev/gpfs1:
            ======================================
            flag                value                    description
            ------------------- ------------------------
            -----------------------------------
              -f                 32768                    Minimum
            fragment size in bytes (system pool)
                                 262144                   Minimum
            fragment size in bytes (other pools)
              -i                 512                      Inode size in
            bytes
              -I                 32768                    Indirect block
            size in bytes
              -m                 2                        Default number
            of metadata replicas
              -M                 2                        Maximum number
            of metadata replicas
              -r                 1                        Default number
            of data replicas
              -R                 2                        Maximum number
            of data replicas
              -j                 scatter                  Block
            allocation type
              -D                 nfs4                     File locking
            semantics in effect
              -k                 all                      ACL semantics
            in effect
              -n                 1000                     Estimated
            number of nodes that will mount file system
              -B                 1048576                  Block size
            (system pool)
                                 8388608                  Block size
            (other pools)
              -Q                 user;group;fileset       Quotas enforced
                                 user;group;fileset       Default quotas
            enabled
              --filesetdf        no                       Fileset df
            enabled?
              -V                 13.23 (3.5.0.7)          File system
            version
              --create-time      Tue Mar 18 16:01:24 2014 File system
            creation time
              -u                 yes                      Support for
            large LUNs?
              -z                 no                       Is DMAPI enabled?
              -L                 4194304                  Logfile size
              -E                 yes                      Exact mtime
            mount option
              -S                 yes                      Suppress atime
            mount option
              -K                 whenpossible             Strict replica
            allocation option
              --fastea           yes                      Fast external
            attributes enabled?
              --inode-limit      134217728                Maximum number
            of inodes
              -P                 system;data              Disk storage
            pools in file system
              -d
            gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
              -d
            gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
              -d
            gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
              -d
            gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
              -d
            gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3
            Disks in file system
              --perfileset-quota no                       Per-fileset
            quota enforcement
              -A                 yes                      Automatic
            mount option
              -o                 none                     Additional
            mount options
              -T                 /gpfs1                   Default mount
            point
              --mount-priority   0                        Mount priority


Regards,
Salvatore


On 14/10/14 17:22, Sven Oehme wrote:
> your GSS code version is very backlevel.
>
> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk
> as well as mmlsconfig and mmlsfs all
>
> thx. Sven
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
>
>
> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
> To: gpfsug-discuss at gpfsug.org
> Date: 10/14/2014 08:23 AM
> Subject: Re: [gpfsug-discuss] wait for permission to append to log
> Sent by: gpfsug-discuss-bounces at gpfsug.org
> ------------------------------------------------------------------------
>
>
>
>
> On 14/10/14 15:51, Sven Oehme wrote:
> it means there is contention on inserting data into the fast write log 
> on the GSS Node, which could be config or workload related
> what GSS code version are you running
> [root at ebi5-251 ~]# mmdiag --version
>
> === mmdiag: version ===
> Current GPFS build: "3.5.0-11 efix1 (888041)".
> Built on Jul  9 2013 at 18:03:32
> Running 6 days 2 hours 10 minutes 35 secs
>
>
>
> and how are the nodes connected with each other (Ethernet or IB) ?
> ethernet. they use the same bonding (4x10Gb/s) where the data is 
> passing. We don't have admin dedicated network
>
> [root at gss03a ~]# mmlscluster
>
> GPFS cluster information
> ========================
>   GPFS cluster name: GSS.ebi.ac.uk
>   GPFS cluster id: 17987981184946329605
>   GPFS UID domain: GSS.ebi.ac.uk
>   Remote shell command:      /usr/bin/ssh
>   Remote file copy command:  /usr/bin/scp
>
> GPFS cluster configuration servers:
> -----------------------------------
>   Primary server:    gss01a.ebi.ac.uk
>   Secondary server:  gss02b.ebi.ac.uk
>
>  Node  Daemon node name    IP address  Admin node name     Designation
> -----------------------------------------------------------------------
>    1   gss01a.ebi.ac.uk    10.7.28.2 gss01a.ebi.ac.uk    quorum-manager
>    2   gss01b.ebi.ac.uk    10.7.28.3 gss01b.ebi.ac.uk    quorum-manager
>    3   gss02a.ebi.ac.uk    10.7.28.67 gss02a.ebi.ac.uk    quorum-manager
>    4   gss02b.ebi.ac.uk    10.7.28.66 gss02b.ebi.ac.uk    quorum-manager
>    5   gss03a.ebi.ac.uk    10.7.28.34 gss03a.ebi.ac.uk    quorum-manager
>    6   gss03b.ebi.ac.uk    10.7.28.35 gss03b.ebi.ac.uk    quorum-manager
>
>
> *Note:* The 3 node "pairs" (gss01, gss02 and gss03)  are in different 
> subnet because of datacenter constraints ( They are not physically in 
> the same row, and due to network constraints was not possible to put 
> them in the same subnet). The packets are routed, but should not be a 
> problem as there is 160Gb/s bandwidth between them.
>
> Regards,
> Salvatore
>
>
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: _oehmes at us.ibm.com_ <mailto:oehmes at us.ibm.com>
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
>
>
> From: Salvatore Di Nardo _<sdinardo at ebi.ac.uk>_ 
> <mailto:sdinardo at ebi.ac.uk>
> To: gpfsug main discussion list _<gpfsug-discuss at gpfsug.org>_ 
> <mailto:gpfsug-discuss at gpfsug.org>
> Date: 10/14/2014 07:40 AM
> Subject: [gpfsug-discuss] wait for permission to append to log
> Sent by: _gpfsug-discuss-bounces at gpfsug.org_ 
> <mailto:gpfsug-discuss-bounces at gpfsug.org>
> ------------------------------------------------------------------------
>
>
>
> hello all,
> could someone explain me the meaning of those waiters?
>
> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>
> Does it means that the vdisk logs are struggling?
>
> Regards,
> Salvatore
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org_
> __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/349c39d3/attachment-0002.htm>
-------------- next part --------------

                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss01a                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       42%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub        4%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub        4%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       59%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s01                 2        DA3             110 GiB  ok                
 e1d1s02                 2        DA2             110 GiB  ok                
 e1d1s03log              2        LOG             186 GiB  ok                
 e1d1s04                 2        DA1             108 GiB  ok                
 e1d1s05                 2        DA2             110 GiB  ok                
 e1d1s06                 2        DA3             110 GiB  ok                
 e1d2s01                 2        DA1             108 GiB  ok                
 e1d2s02                 2        DA2             110 GiB  ok                
 e1d2s03                 2        DA3             110 GiB  ok                
 e1d2s04                 2        DA1             108 GiB  ok                
 e1d2s05                 2        DA2             110 GiB  ok                
 e1d2s06                 2        DA3             110 GiB  ok                
 e1d3s01                 2        DA1             108 GiB  ok                
 e1d3s02                 2        DA2             110 GiB  ok                
 e1d3s03                 2        DA3             110 GiB  ok                
 e1d3s04                 2        DA1             108 GiB  ok                
 e1d3s05                 2        DA2             110 GiB  ok                
 e1d3s06                 2        DA3             110 GiB  ok                
 e1d4s01                 2        DA1             108 GiB  ok                
 e1d4s02                 2        DA2             110 GiB  ok                
 e1d4s03                 2        DA3             110 GiB  ok                
 e1d4s04                 2        DA1             108 GiB  ok                
 e1d4s05                 2        DA2             110 GiB  ok                
 e1d4s06                 2        DA3             110 GiB  ok                
 e1d5s01                 2        DA1             108 GiB  ok                
 e1d5s02                 2        DA2             110 GiB  ok                
 e1d5s03                 2        DA3             110 GiB  ok                
 e1d5s04                 2        DA1             108 GiB  ok                
 e1d5s05                 2        DA2             110 GiB  ok                
 e1d5s06                 2        DA3             110 GiB  ok                
 e2d1s01                 2        DA3             110 GiB  ok                
 e2d1s02                 2        DA2             110 GiB  ok                
 e2d1s03log              2        LOG             186 GiB  ok                
 e2d1s04                 2        DA1             108 GiB  ok                
 e2d1s05                 2        DA2             110 GiB  ok                
 e2d1s06                 2        DA3             110 GiB  ok                
 e2d2s01                 2        DA1             108 GiB  ok                
 e2d2s02                 2        DA2             110 GiB  ok                
 e2d2s03                 2        DA3             110 GiB  ok                
 e2d2s04                 2        DA1             108 GiB  ok                
 e2d2s05                 2        DA2             110 GiB  ok                
 e2d2s06                 2        DA3             110 GiB  ok                
 e2d3s01                 2        DA1             108 GiB  ok                
 e2d3s02                 2        DA2             110 GiB  ok                
 e2d3s03                 2        DA3             110 GiB  ok                
 e2d3s04                 2        DA1             108 GiB  ok                
 e2d3s05                 2        DA2             110 GiB  ok                
 e2d3s06                 2        DA3             110 GiB  ok                
 e2d4s01                 2        DA1             108 GiB  ok                
 e2d4s02                 2        DA2             110 GiB  ok                
 e2d4s03                 2        DA3             110 GiB  ok                
 e2d4s04                 2        DA1             108 GiB  ok                
 e2d4s05                 2        DA2             110 GiB  ok                
 e2d4s06                 2        DA3             110 GiB  ok                
 e2d5s01                 2        DA1             108 GiB  ok                
 e2d5s02                 2        DA2             110 GiB  ok                
 e2d5s03                 2        DA3             110 GiB  ok                
 e2d5s04                 2        DA1             108 GiB  ok                
 e2d5s05                 2        DA2             110 GiB  ok                
 e2d5s06                 2        DA3             110 GiB  ok                
 e3d1s01                 2        DA1             108 GiB  ok                
 e3d1s02                 2        DA3             110 GiB  ok                
 e3d1s03log              2        LOG             186 GiB  ok                
 e3d1s04                 2        DA1             108 GiB  ok                
 e3d1s05                 2        DA2             110 GiB  ok                
 e3d1s06                 2        DA3             110 GiB  ok                
 e3d2s01                 2        DA1             108 GiB  ok                
 e3d2s02                 2        DA2             110 GiB  ok                
 e3d2s03                 2        DA3             110 GiB  ok                
 e3d2s04                 2        DA1             108 GiB  ok                
 e3d2s05                 2        DA2             110 GiB  ok                
 e3d2s06                 2        DA3             110 GiB  ok                
 e3d3s01                 2        DA1             108 GiB  ok                
 e3d3s02                 2        DA2             110 GiB  ok                
 e3d3s03                 2        DA3             110 GiB  ok                
 e3d3s04                 2        DA1             108 GiB  ok                
 e3d3s05                 2        DA2             110 GiB  ok                
 e3d3s06                 2        DA3             110 GiB  ok                
 e3d4s01                 2        DA1             108 GiB  ok                
 e3d4s02                 2        DA2             110 GiB  ok                
 e3d4s03                 2        DA3             110 GiB  ok                
 e3d4s04                 2        DA1             108 GiB  ok                
 e3d4s05                 2        DA2             110 GiB  ok                
 e3d4s06                 2        DA3             110 GiB  ok                
 e3d5s01                 2        DA1             108 GiB  ok                
 e3d5s02                 2        DA2             110 GiB  ok                
 e3d5s03                 2        DA3             110 GiB  ok                
 e3d5s04                 2        DA1             108 GiB  ok                
 e3d5s05                 2        DA2             110 GiB  ok                
 e3d5s06                 2        DA3             110 GiB  ok                
 e4d1s01                 2        DA1             108 GiB  ok                
 e4d1s02                 2        DA3             110 GiB  ok                
 e4d1s04                 2        DA1             108 GiB  ok                
 e4d1s05                 2        DA2             110 GiB  ok                
 e4d1s06                 2        DA3             110 GiB  ok                
 e4d2s01                 2        DA1             108 GiB  ok                
 e4d2s02                 2        DA2             110 GiB  ok                
 e4d2s03                 2        DA3             110 GiB  ok                
 e4d2s04                 2        DA1             106 GiB  ok                
 e4d2s05                 2        DA2             110 GiB  ok                
 e4d2s06                 2        DA3             110 GiB  ok                
 e4d3s01                 2        DA1             106 GiB  ok                
 e4d3s02                 2        DA2             110 GiB  ok                
 e4d3s03                 2        DA3             110 GiB  ok                
 e4d3s04                 2        DA1             106 GiB  ok                
 e4d3s05                 2        DA2             110 GiB  ok                
 e4d3s06                 2        DA3             110 GiB  ok                
 e4d4s01                 2        DA1             106 GiB  ok                
 e4d4s02                 2        DA2             110 GiB  ok                
 e4d4s03                 2        DA3             110 GiB  ok                
 e4d4s04                 2        DA1             106 GiB  ok                
 e4d4s05                 2        DA2             110 GiB  ok                
 e4d4s06                 2        DA3             110 GiB  ok                
 e4d5s01                 2        DA1             106 GiB  ok                
 e4d5s02                 2        DA2             110 GiB  ok                
 e4d5s03                 2        DA3             110 GiB  ok                
 e4d5s04                 2        DA1             106 GiB  ok                
 e4d5s05                 2        DA2             110 GiB  ok                
 e4d5s06                 2        DA3             110 GiB  ok                
 e5d1s01                 2        DA1             106 GiB  ok                
 e5d1s02                 2        DA2             110 GiB  ok                
 e5d1s04                 2        DA1             106 GiB  ok                
 e5d1s05                 2        DA2             110 GiB  ok                
 e5d1s06                 2        DA3             110 GiB  ok                
 e5d2s01                 2        DA1             106 GiB  ok                
 e5d2s02                 2        DA2             110 GiB  ok                
 e5d2s03                 2        DA3             110 GiB  ok                
 e5d2s04                 2        DA1             106 GiB  ok                
 e5d2s05                 2        DA2             110 GiB  ok                
 e5d2s06                 2        DA3             110 GiB  ok                
 e5d3s01                 2        DA1             106 GiB  ok                
 e5d3s02                 2        DA2             110 GiB  ok                
 e5d3s03                 2        DA3             110 GiB  ok                
 e5d3s04                 2        DA1             106 GiB  ok                
 e5d3s05                 2        DA2             110 GiB  ok                
 e5d3s06                 2        DA3             110 GiB  ok                
 e5d4s01                 2        DA1             106 GiB  ok                
 e5d4s02                 2        DA2             110 GiB  ok                
 e5d4s03                 2        DA3             110 GiB  ok                
 e5d4s04                 2        DA1             106 GiB  ok                
 e5d4s05                 2        DA2             110 GiB  ok                
 e5d4s06                 2        DA3             110 GiB  ok                
 e5d5s01                 2        DA1             106 GiB  ok                
 e5d5s02                 2        DA2             110 GiB  ok                
 e5d5s03                 2        DA3             110 GiB  ok                
 e5d5s04                 2        DA1             106 GiB  ok                
 e5d5s05                 2        DA2             110 GiB  ok                
 e5d5s06                 2        DA3             110 GiB  ok                
 e6d1s01                 2        DA1             106 GiB  ok                
 e6d1s02                 2        DA2             110 GiB  ok                
 e6d1s04                 2        DA1             106 GiB  ok                
 e6d1s05                 2        DA2             110 GiB  ok                
 e6d1s06                 2        DA3             110 GiB  ok                
 e6d2s01                 2        DA1             106 GiB  ok                
 e6d2s02                 2        DA2             110 GiB  ok                
 e6d2s03                 2        DA3             110 GiB  ok                
 e6d2s04                 2        DA1             106 GiB  ok                
 e6d2s05                 2        DA2             110 GiB  ok                
 e6d2s06                 2        DA3             110 GiB  ok                
 e6d3s01                 2        DA1             106 GiB  ok                
 e6d3s02                 2        DA2             110 GiB  ok                
 e6d3s03                 2        DA3             110 GiB  ok                
 e6d3s04                 2        DA1             106 GiB  ok                
 e6d3s05                 2        DA2             108 GiB  ok                
 e6d3s06                 2        DA3             108 GiB  ok                
 e6d4s01                 2        DA1             106 GiB  ok                
 e6d4s02                 2        DA2             108 GiB  ok                
 e6d4s03                 2        DA3             108 GiB  ok                
 e6d4s04                 2        DA1             106 GiB  ok                
 e6d4s05                 2        DA2             108 GiB  ok                
 e6d4s06                 2        DA3             108 GiB  ok                
 e6d5s01                 2        DA1             106 GiB  ok                
 e6d5s02                 2        DA2             108 GiB  ok                
 e6d5s03                 2        DA3             108 GiB  ok                
 e6d5s04                 2        DA1             106 GiB  ok                
 e6d5s05                 2        DA2             108 GiB  ok                
 e6d5s06                 2        DA3             108 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss01a_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss01a_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss01a_MetaData_8M_3p_1  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss01a_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss01a_MetaData_8M_3p_3  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss01a_Data_8M_3p_1  8+3p                DA3              99 TiB      8 MiB     32 KiB             
 gss01a_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss01a_Data_8M_3p_3  8+3p                DA1              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss01a.ebi.ac.uk                                 gss01a.ebi.ac.uk,gss01b.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss01b                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       36%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       61%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       68%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       70%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s07                 2        DA1             108 GiB  ok                
 e1d1s08                 2        DA2             110 GiB  ok                
 e1d1s09                 2        DA3             110 GiB  ok                
 e1d1s10                 2        DA1             108 GiB  ok                
 e1d1s11                 2        DA2             110 GiB  ok                
 e1d1s12                 2        DA3             110 GiB  ok                
 e1d2s07                 2        DA1             108 GiB  ok                
 e1d2s08                 2        DA2             110 GiB  ok                
 e1d2s09                 2        DA3             110 GiB  ok                
 e1d2s10                 2        DA1             108 GiB  ok                
 e1d2s11                 2        DA2             110 GiB  ok                
 e1d2s12                 2        DA3             110 GiB  ok                
 e1d3s07                 2        DA1             108 GiB  ok                
 e1d3s08                 2        DA2             110 GiB  ok                
 e1d3s09                 2        DA3             110 GiB  ok                
 e1d3s10                 2        DA1             108 GiB  ok                
 e1d3s11                 2        DA2             110 GiB  ok                
 e1d3s12                 2        DA3             110 GiB  ok                
 e1d4s07                 2        DA1             108 GiB  ok                
 e1d4s08                 2        DA2             110 GiB  ok                
 e1d4s09                 2        DA3             110 GiB  ok                
 e1d4s10                 2        DA1             108 GiB  ok                
 e1d4s11                 2        DA2             110 GiB  ok                
 e1d4s12                 2        DA3             110 GiB  ok                
 e1d5s07                 2        DA1             108 GiB  ok                
 e1d5s08                 2        DA2             110 GiB  ok                
 e1d5s09                 2        DA3             110 GiB  ok                
 e1d5s10                 2        DA3             110 GiB  ok                
 e1d5s11                 2        DA2             110 GiB  ok                
 e1d5s12log              2        LOG             186 GiB  ok                
 e2d1s07                 2        DA1             106 GiB  ok                
 e2d1s08                 2        DA2             110 GiB  ok                
 e2d1s09                 2        DA3             110 GiB  ok                
 e2d1s10                 2        DA1             108 GiB  ok                
 e2d1s11                 2        DA2             110 GiB  ok                
 e2d1s12                 2        DA3             110 GiB  ok                
 e2d2s07                 2        DA1             108 GiB  ok                
 e2d2s08                 2        DA2             110 GiB  ok                
 e2d2s09                 2        DA3             110 GiB  ok                
 e2d2s10                 2        DA1             108 GiB  ok                
 e2d2s11                 2        DA2             110 GiB  ok                
 e2d2s12                 2        DA3             110 GiB  ok                
 e2d3s07                 2        DA1             108 GiB  ok                
 e2d3s08                 2        DA2             110 GiB  ok                
 e2d3s09                 2        DA3             110 GiB  ok                
 e2d3s10                 2        DA1             108 GiB  ok                
 e2d3s11                 2        DA2             110 GiB  ok                
 e2d3s12                 2        DA3             110 GiB  ok                
 e2d4s07                 2        DA1             108 GiB  ok                
 e2d4s08                 2        DA2             110 GiB  ok                
 e2d4s09                 2        DA3             110 GiB  ok                
 e2d4s10                 2        DA1             108 GiB  ok                
 e2d4s11                 2        DA2             110 GiB  ok                
 e2d4s12                 2        DA3             110 GiB  ok                
 e2d5s07                 2        DA1             108 GiB  ok                
 e2d5s08                 2        DA2             110 GiB  ok                
 e2d5s09                 2        DA3             110 GiB  ok                
 e2d5s10                 2        DA3             110 GiB  ok                
 e2d5s11                 2        DA2             110 GiB  ok                
 e2d5s12log              2        LOG             186 GiB  ok                
 e3d1s07                 2        DA1             108 GiB  ok                
 e3d1s08                 2        DA2             110 GiB  ok                
 e3d1s09                 2        DA3             110 GiB  ok                
 e3d1s10                 2        DA1             108 GiB  ok                
 e3d1s11                 2        DA2             110 GiB  ok                
 e3d1s12                 2        DA3             110 GiB  ok                
 e3d2s07                 2        DA1             108 GiB  ok                
 e3d2s08                 2        DA2             110 GiB  ok                
 e3d2s09                 2        DA3             110 GiB  ok                
 e3d2s10                 2        DA1             108 GiB  ok                
 e3d2s11                 2        DA2             110 GiB  ok                
 e3d2s12                 2        DA3             110 GiB  ok                
 e3d3s07                 2        DA1             108 GiB  ok                
 e3d3s08                 2        DA2             110 GiB  ok                
 e3d3s09                 2        DA3             110 GiB  ok                
 e3d3s10                 2        DA1             108 GiB  ok                
 e3d3s11                 2        DA2             110 GiB  ok                
 e3d3s12                 2        DA3             110 GiB  ok                
 e3d4s07                 2        DA1             108 GiB  ok                
 e3d4s08                 2        DA2             110 GiB  ok                
 e3d4s09                 2        DA3             110 GiB  ok                
 e3d4s10                 2        DA1             108 GiB  ok                
 e3d4s11                 2        DA2             110 GiB  ok                
 e3d4s12                 2        DA3             110 GiB  ok                
 e3d5s07                 2        DA1             108 GiB  ok                
 e3d5s08                 2        DA2             110 GiB  ok                
 e3d5s09                 2        DA3             110 GiB  ok                
 e3d5s10                 2        DA1             108 GiB  ok                
 e3d5s11                 2        DA3             110 GiB  ok                
 e3d5s12log              2        LOG             186 GiB  ok                
 e4d1s07                 2        DA1             108 GiB  ok                
 e4d1s08                 2        DA2             110 GiB  ok                
 e4d1s09                 2        DA3             110 GiB  ok                
 e4d1s10                 2        DA1             108 GiB  ok                
 e4d1s11                 2        DA2             110 GiB  ok                
 e4d1s12                 2        DA3             110 GiB  ok                
 e4d2s07                 2        DA1             108 GiB  ok                
 e4d2s08                 2        DA2             110 GiB  ok                
 e4d2s09                 2        DA3             110 GiB  ok                
 e4d2s10                 2        DA1             108 GiB  ok                
 e4d2s11                 2        DA2             110 GiB  ok                
 e4d2s12                 2        DA3             110 GiB  ok                
 e4d3s07                 2        DA1             106 GiB  ok                
 e4d3s08                 2        DA2             110 GiB  ok                
 e4d3s09                 2        DA3             110 GiB  ok                
 e4d3s10                 2        DA1             106 GiB  ok                
 e4d3s11                 2        DA2             110 GiB  ok                
 e4d3s12                 2        DA3             110 GiB  ok                
 e4d4s07                 2        DA1             106 GiB  ok                
 e4d4s08                 2        DA2             110 GiB  ok                
 e4d4s09                 2        DA3             110 GiB  ok                
 e4d4s10                 2        DA1             106 GiB  ok                
 e4d4s11                 2        DA2             110 GiB  ok                
 e4d4s12                 2        DA3             110 GiB  ok                
 e4d5s07                 2        DA1             106 GiB  ok                
 e4d5s08                 2        DA2             110 GiB  ok                
 e4d5s09                 2        DA3             110 GiB  ok                
 e4d5s10                 2        DA1             106 GiB  ok                
 e4d5s11                 2        DA3             110 GiB  ok                
 e5d1s07                 2        DA1             106 GiB  ok                
 e5d1s08                 2        DA2             110 GiB  ok                
 e5d1s09                 2        DA3             110 GiB  ok                
 e5d1s10                 2        DA1             106 GiB  ok                
 e5d1s11                 2        DA2             110 GiB  ok                
 e5d1s12                 2        DA3             110 GiB  ok                
 e5d2s07                 2        DA1             106 GiB  ok                
 e5d2s08                 2        DA2             110 GiB  ok                
 e5d2s09                 2        DA3             110 GiB  ok                
 e5d2s10                 2        DA1             106 GiB  ok                
 e5d2s11                 2        DA2             110 GiB  ok                
 e5d2s12                 2        DA3             110 GiB  ok                
 e5d3s07                 2        DA1             106 GiB  ok                
 e5d3s08                 2        DA2             110 GiB  ok                
 e5d3s09                 2        DA3             110 GiB  ok                
 e5d3s10                 2        DA1             106 GiB  ok                
 e5d3s11                 2        DA2             110 GiB  ok                
 e5d3s12                 2        DA3             108 GiB  ok                
 e5d4s07                 2        DA1             106 GiB  ok                
 e5d4s08                 2        DA2             110 GiB  ok                
 e5d4s09                 2        DA3             110 GiB  ok                
 e5d4s10                 2        DA1             106 GiB  ok                
 e5d4s11                 2        DA2             110 GiB  ok                
 e5d4s12                 2        DA3             110 GiB  ok                
 e5d5s07                 2        DA1             106 GiB  ok                
 e5d5s08                 2        DA2             110 GiB  ok                
 e5d5s09                 2        DA3             110 GiB  ok                
 e5d5s10                 2        DA1             106 GiB  ok                
 e5d5s11                 2        DA2             110 GiB  ok                
 e6d1s07                 2        DA1             106 GiB  ok                
 e6d1s08                 2        DA2             110 GiB  ok                
 e6d1s09                 2        DA3             110 GiB  ok                
 e6d1s10                 2        DA1             106 GiB  ok                
 e6d1s11                 2        DA2             110 GiB  ok                
 e6d1s12                 2        DA3             110 GiB  ok                
 e6d2s07                 2        DA1             106 GiB  ok                
 e6d2s08                 2        DA2             110 GiB  ok                
 e6d2s09                 2        DA3             110 GiB  ok                
 e6d2s10                 2        DA1             106 GiB  ok                
 e6d2s11                 2        DA2             110 GiB  ok                
 e6d2s12                 2        DA3             110 GiB  ok                
 e6d3s07                 2        DA1             106 GiB  ok                
 e6d3s08                 2        DA2             108 GiB  ok                
 e6d3s09                 2        DA3             110 GiB  ok                
 e6d3s10                 2        DA1             106 GiB  ok                
 e6d3s11                 2        DA2             108 GiB  ok                
 e6d3s12                 2        DA3             108 GiB  ok                
 e6d4s07                 2        DA1             106 GiB  ok                
 e6d4s08                 2        DA2             108 GiB  ok                
 e6d4s09                 2        DA3             108 GiB  ok                
 e6d4s10                 2        DA1             106 GiB  ok                
 e6d4s11                 2        DA2             108 GiB  ok                
 e6d4s12                 2        DA3             108 GiB  ok                
 e6d5s07                 2        DA1             106 GiB  ok                
 e6d5s08                 2        DA2             110 GiB  ok                
 e6d5s09                 2        DA3             108 GiB  ok                
 e6d5s10                 2        DA1             106 GiB  ok                
 e6d5s11                 2        DA2             108 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss01b_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss01b_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss01b_MetaData_8M_3p_1  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss01b_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss01b_MetaData_8M_3p_3  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss01b_Data_8M_3p_1  8+3p                DA1              99 TiB      8 MiB     32 KiB             
 gss01b_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss01b_Data_8M_3p_3  8+3p                DA3              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss01b.ebi.ac.uk                                 gss01b.ebi.ac.uk,gss01a.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss02a                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       41%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub        8%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       14%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub        5%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s01                 2        DA3             110 GiB  ok                
 e1d1s02                 2        DA2             110 GiB  ok                
 e1d1s03log              2        LOG             186 GiB  ok                
 e1d1s04                 2        DA1             108 GiB  ok                
 e1d1s05                 2        DA2             110 GiB  ok                
 e1d1s06                 2        DA3             110 GiB  ok                
 e1d2s01                 2        DA1             108 GiB  ok                
 e1d2s02                 2        DA2             110 GiB  ok                
 e1d2s03                 2        DA3             110 GiB  ok                
 e1d2s04                 2        DA1             108 GiB  ok                
 e1d2s05                 2        DA2             110 GiB  ok                
 e1d2s06                 2        DA3             110 GiB  ok                
 e1d3s01                 2        DA1             108 GiB  ok                
 e1d3s02                 2        DA2             110 GiB  ok                
 e1d3s03                 2        DA3             110 GiB  ok                
 e1d3s04                 2        DA1             108 GiB  ok                
 e1d3s05                 2        DA2             110 GiB  ok                
 e1d3s06                 2        DA3             110 GiB  ok                
 e1d4s01                 2        DA1             108 GiB  ok                
 e1d4s02                 2        DA2             110 GiB  ok                
 e1d4s03                 2        DA3             110 GiB  ok                
 e1d4s04                 2        DA1             108 GiB  ok                
 e1d4s05                 2        DA2             110 GiB  ok                
 e1d4s06                 2        DA3             110 GiB  ok                
 e1d5s01                 2        DA1             108 GiB  ok                
 e1d5s02                 2        DA2             110 GiB  ok                
 e1d5s03                 2        DA3             110 GiB  ok                
 e1d5s04                 2        DA1             108 GiB  ok                
 e1d5s05                 2        DA2             110 GiB  ok                
 e1d5s06                 2        DA3             110 GiB  ok                
 e2d1s01                 2        DA3             110 GiB  ok                
 e2d1s02                 2        DA2             110 GiB  ok                
 e2d1s03log              2        LOG             186 GiB  ok                
 e2d1s04                 2        DA1             108 GiB  ok                
 e2d1s05                 2        DA2             110 GiB  ok                
 e2d1s06                 2        DA3             110 GiB  ok                
 e2d2s01                 2        DA1             106 GiB  ok                
 e2d2s02                 2        DA2             110 GiB  ok                
 e2d2s03                 2        DA3             110 GiB  ok                
 e2d2s04                 2        DA1             106 GiB  ok                
 e2d2s05                 2        DA2             110 GiB  ok                
 e2d2s06                 2        DA3             110 GiB  ok                
 e2d3s01                 2        DA1             106 GiB  ok                
 e2d3s02                 2        DA2             110 GiB  ok                
 e2d3s03                 2        DA3             110 GiB  ok                
 e2d3s04                 2        DA1             106 GiB  ok                
 e2d3s05                 2        DA2             110 GiB  ok                
 e2d3s06                 2        DA3             110 GiB  ok                
 e2d4s01                 2        DA1             106 GiB  ok                
 e2d4s02                 2        DA2             110 GiB  ok                
 e2d4s03                 2        DA3             110 GiB  ok                
 e2d4s04                 2        DA1             106 GiB  ok                
 e2d4s05                 2        DA2             110 GiB  ok                
 e2d4s06                 2        DA3             110 GiB  ok                
 e2d5s01                 2        DA1             108 GiB  ok                
 e2d5s02                 2        DA2             110 GiB  ok                
 e2d5s03                 2        DA3             110 GiB  ok                
 e2d5s04                 2        DA1             108 GiB  ok                
 e2d5s05                 2        DA2             110 GiB  ok                
 e2d5s06                 2        DA3             110 GiB  ok                
 e3d1s01                 2        DA1             108 GiB  ok                
 e3d1s02                 2        DA3             110 GiB  ok                
 e3d1s03log              2        LOG             186 GiB  ok                
 e3d1s04                 2        DA1             106 GiB  ok                
 e3d1s05                 2        DA2             110 GiB  ok                
 e3d1s06                 2        DA3             110 GiB  ok                
 e3d2s01                 2        DA1             106 GiB  ok                
 e3d2s02                 2        DA2             110 GiB  ok                
 e3d2s03                 2        DA3             110 GiB  ok                
 e3d2s04                 2        DA1             108 GiB  ok                
 e3d2s05                 2        DA2             110 GiB  ok                
 e3d2s06                 2        DA3             110 GiB  ok                
 e3d3s01                 2        DA1             106 GiB  ok                
 e3d3s02                 2        DA2             110 GiB  ok                
 e3d3s03                 2        DA3             110 GiB  ok                
 e3d3s04                 2        DA1             106 GiB  ok                
 e3d3s05                 2        DA2             110 GiB  ok                
 e3d3s06                 2        DA3             110 GiB  ok                
 e3d4s01                 2        DA1             106 GiB  ok                
 e3d4s02                 2        DA2             110 GiB  ok                
 e3d4s03                 2        DA3             110 GiB  ok                
 e3d4s04                 2        DA1             108 GiB  ok                
 e3d4s05                 2        DA2             110 GiB  ok                
 e3d4s06                 2        DA3             110 GiB  ok                
 e3d5s01                 2        DA1             108 GiB  ok                
 e3d5s02                 2        DA2             110 GiB  ok                
 e3d5s03                 2        DA3             110 GiB  ok                
 e3d5s04                 2        DA1             106 GiB  ok                
 e3d5s05                 2        DA2             110 GiB  ok                
 e3d5s06                 2        DA3             110 GiB  ok                
 e4d1s01                 2        DA1             106 GiB  ok                
 e4d1s02                 2        DA3             110 GiB  ok                
 e4d1s04                 2        DA1             106 GiB  ok                
 e4d1s05                 2        DA2             110 GiB  ok                
 e4d1s06                 2        DA3             110 GiB  ok                
 e4d2s01                 2        DA1             106 GiB  ok                
 e4d2s02                 2        DA2             110 GiB  ok                
 e4d2s03                 2        DA3             110 GiB  ok                
 e4d2s04                 2        DA1             106 GiB  ok                
 e4d2s05                 2        DA2             110 GiB  ok                
 e4d2s06                 2        DA3             110 GiB  ok                
 e4d3s01                 2        DA1             108 GiB  ok                
 e4d3s02                 2        DA2             110 GiB  ok                
 e4d3s03                 2        DA3             110 GiB  ok                
 e4d3s04                 2        DA1             108 GiB  ok                
 e4d3s05                 2        DA2             110 GiB  ok                
 e4d3s06                 2        DA3             110 GiB  ok                
 e4d4s01                 2        DA1             106 GiB  ok                
 e4d4s02                 2        DA2             110 GiB  ok                
 e4d4s03                 2        DA3             110 GiB  ok                
 e4d4s04                 2        DA1             106 GiB  ok                
 e4d4s05                 2        DA2             110 GiB  ok                
 e4d4s06                 2        DA3             110 GiB  ok                
 e4d5s01                 2        DA1             106 GiB  ok                
 e4d5s02                 2        DA2             110 GiB  ok                
 e4d5s03                 2        DA3             110 GiB  ok                
 e4d5s04                 2        DA1             106 GiB  ok                
 e4d5s05                 2        DA2             110 GiB  ok                
 e4d5s06                 2        DA3             110 GiB  ok                
 e5d1s01                 2        DA1             108 GiB  ok                
 e5d1s02                 2        DA2             110 GiB  ok                
 e5d1s04                 2        DA1             106 GiB  ok                
 e5d1s05                 2        DA2             110 GiB  ok                
 e5d1s06                 2        DA3             110 GiB  ok                
 e5d2s01                 2        DA1             108 GiB  ok                
 e5d2s02                 2        DA2             110 GiB  ok                
 e5d2s03                 2        DA3             110 GiB  ok                
 e5d2s04                 2        DA1             108 GiB  ok                
 e5d2s05                 2        DA2             110 GiB  ok                
 e5d2s06                 2        DA3             110 GiB  ok                
 e5d3s01                 2        DA1             108 GiB  ok                
 e5d3s02                 2        DA2             110 GiB  ok                
 e5d3s03                 2        DA3             110 GiB  ok                
 e5d3s04                 2        DA1             106 GiB  ok                
 e5d3s05                 2        DA2             110 GiB  ok                
 e5d3s06                 2        DA3             110 GiB  ok                
 e5d4s01                 2        DA1             108 GiB  ok                
 e5d4s02                 2        DA2             110 GiB  ok                
 e5d4s03                 2        DA3             110 GiB  ok                
 e5d4s04                 2        DA1             108 GiB  ok                
 e5d4s05                 2        DA2             110 GiB  ok                
 e5d4s06                 2        DA3             110 GiB  ok                
 e5d5s01                 2        DA1             108 GiB  ok                
 e5d5s02                 2        DA2             110 GiB  ok                
 e5d5s03                 2        DA3             110 GiB  ok                
 e5d5s04                 2        DA1             106 GiB  ok                
 e5d5s05                 2        DA2             110 GiB  ok                
 e5d5s06                 2        DA3             110 GiB  ok                
 e6d1s01                 2        DA1             108 GiB  ok                
 e6d1s02                 2        DA2             110 GiB  ok                
 e6d1s04                 2        DA1             108 GiB  ok                
 e6d1s05                 2        DA2             110 GiB  ok                
 e6d1s06                 2        DA3             110 GiB  ok                
 e6d2s01                 2        DA1             106 GiB  ok                
 e6d2s02                 2        DA2             110 GiB  ok                
 e6d2s03                 2        DA3             110 GiB  ok                
 e6d2s04                 2        DA1             108 GiB  ok                
 e6d2s05                 2        DA2             108 GiB  ok                
 e6d2s06                 2        DA3             110 GiB  ok                
 e6d3s01                 2        DA1             106 GiB  ok                
 e6d3s02                 2        DA2             108 GiB  ok                
 e6d3s03                 2        DA3             110 GiB  ok                
 e6d3s04                 2        DA1             106 GiB  ok                
 e6d3s05                 2        DA2             108 GiB  ok                
 e6d3s06                 2        DA3             108 GiB  ok                
 e6d4s01                 2        DA1             106 GiB  ok                
 e6d4s02                 2        DA2             108 GiB  ok                
 e6d4s03                 2        DA3             108 GiB  ok                
 e6d4s04                 2        DA1             108 GiB  ok                
 e6d4s05                 2        DA2             108 GiB  ok                
 e6d4s06                 2        DA3             108 GiB  ok                
 e6d5s01                 2        DA1             108 GiB  ok                
 e6d5s02                 2        DA2             110 GiB  ok                
 e6d5s03                 2        DA3             108 GiB  ok                
 e6d5s04                 2        DA1             108 GiB  ok                
 e6d5s05                 2        DA2             110 GiB  ok                
 e6d5s06                 2        DA3             108 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss02a_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss02a_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss02a_MetaData_8M_3p_1  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss02a_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss02a_MetaData_8M_3p_3  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss02a_Data_8M_3p_1  8+3p                DA3              99 TiB      8 MiB     32 KiB             
 gss02a_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss02a_Data_8M_3p_3  8+3p                DA1              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss02a.ebi.ac.uk                                 gss02a.ebi.ac.uk,gss02b.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss02b                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       39%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       67%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       13%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       13%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s07                 2        DA1             108 GiB  ok                
 e1d1s08                 2        DA2             110 GiB  ok                
 e1d1s09                 2        DA3             110 GiB  ok                
 e1d1s10                 2        DA1             108 GiB  ok                
 e1d1s11                 2        DA2             110 GiB  ok                
 e1d1s12                 2        DA3             110 GiB  ok                
 e1d2s07                 2        DA1             108 GiB  ok                
 e1d2s08                 2        DA2             110 GiB  ok                
 e1d2s09                 2        DA3             110 GiB  ok                
 e1d2s10                 2        DA1             108 GiB  ok                
 e1d2s11                 2        DA2             110 GiB  ok                
 e1d2s12                 2        DA3             110 GiB  ok                
 e1d3s07                 2        DA1             108 GiB  ok                
 e1d3s08                 2        DA2             110 GiB  ok                
 e1d3s09                 2        DA3             110 GiB  ok                
 e1d3s10                 2        DA1             108 GiB  ok                
 e1d3s11                 2        DA2             110 GiB  ok                
 e1d3s12                 2        DA3             110 GiB  ok                
 e1d4s07                 2        DA1             108 GiB  ok                
 e1d4s08                 2        DA2             110 GiB  ok                
 e1d4s09                 2        DA3             110 GiB  ok                
 e1d4s10                 2        DA1             108 GiB  ok                
 e1d4s11                 2        DA2             110 GiB  ok                
 e1d4s12                 2        DA3             110 GiB  ok                
 e1d5s07                 2        DA1             108 GiB  ok                
 e1d5s08                 2        DA2             110 GiB  ok                
 e1d5s09                 2        DA3             110 GiB  ok                
 e1d5s10                 2        DA3             110 GiB  ok                
 e1d5s11                 2        DA2             110 GiB  ok                
 e1d5s12log              2        LOG             186 GiB  ok                
 e2d1s07                 2        DA1             108 GiB  ok                
 e2d1s08                 2        DA2             110 GiB  ok                
 e2d1s09                 2        DA3             110 GiB  ok                
 e2d1s10                 2        DA1             108 GiB  ok                
 e2d1s11                 2        DA2             110 GiB  ok                
 e2d1s12                 2        DA3             110 GiB  ok                
 e2d2s07                 2        DA1             108 GiB  ok                
 e2d2s08                 2        DA2             110 GiB  ok                
 e2d2s09                 2        DA3             110 GiB  ok                
 e2d2s10                 2        DA1             108 GiB  ok                
 e2d2s11                 2        DA2             110 GiB  ok                
 e2d2s12                 2        DA3             110 GiB  ok                
 e2d3s07                 2        DA1             108 GiB  ok                
 e2d3s08                 2        DA2             110 GiB  ok                
 e2d3s09                 2        DA3             110 GiB  ok                
 e2d3s10                 2        DA1             108 GiB  ok                
 e2d3s11                 2        DA2             110 GiB  ok                
 e2d3s12                 2        DA3             110 GiB  ok                
 e2d4s07                 2        DA1             108 GiB  ok                
 e2d4s08                 2        DA2             110 GiB  ok                
 e2d4s09                 2        DA3             110 GiB  ok                
 e2d4s10                 2        DA1             108 GiB  ok                
 e2d4s11                 2        DA2             110 GiB  ok                
 e2d4s12                 2        DA3             110 GiB  ok                
 e2d5s07                 2        DA1             108 GiB  ok                
 e2d5s08                 2        DA2             110 GiB  ok                
 e2d5s09                 2        DA3             110 GiB  ok                
 e2d5s10                 2        DA3             110 GiB  ok                
 e2d5s11                 2        DA2             110 GiB  ok                
 e2d5s12log              2        LOG             186 GiB  ok                
 e3d1s07                 2        DA1             108 GiB  ok                
 e3d1s08                 2        DA2             110 GiB  ok                
 e3d1s09                 2        DA3             110 GiB  ok                
 e3d1s10                 2        DA1             108 GiB  ok                
 e3d1s11                 2        DA2             110 GiB  ok                
 e3d1s12                 2        DA3             110 GiB  ok                
 e3d2s07                 2        DA1             108 GiB  ok                
 e3d2s08                 2        DA2             110 GiB  ok                
 e3d2s09                 2        DA3             110 GiB  ok                
 e3d2s10                 2        DA1             108 GiB  ok                
 e3d2s11                 2        DA2             110 GiB  ok                
 e3d2s12                 2        DA3             110 GiB  ok                
 e3d3s07                 2        DA1             108 GiB  ok                
 e3d3s08                 2        DA2             110 GiB  ok                
 e3d3s09                 2        DA3             110 GiB  ok                
 e3d3s10                 2        DA1             108 GiB  ok                
 e3d3s11                 2        DA2             110 GiB  ok                
 e3d3s12                 2        DA3             110 GiB  ok                
 e3d4s07                 2        DA1             108 GiB  ok                
 e3d4s08                 2        DA2             110 GiB  ok                
 e3d4s09                 2        DA3             110 GiB  ok                
 e3d4s10                 2        DA1             108 GiB  ok                
 e3d4s11                 2        DA2             110 GiB  ok                
 e3d4s12                 2        DA3             110 GiB  ok                
 e3d5s07                 2        DA1             108 GiB  ok                
 e3d5s08                 2        DA2             110 GiB  ok                
 e3d5s09                 2        DA3             110 GiB  ok                
 e3d5s10                 2        DA1             108 GiB  ok                
 e3d5s11                 2        DA3             110 GiB  ok                
 e3d5s12log              2        LOG             186 GiB  ok                
 e4d1s07                 2        DA1             108 GiB  ok                
 e4d1s08                 2        DA2             110 GiB  ok                
 e4d1s09                 2        DA3             110 GiB  ok                
 e4d1s10                 2        DA1             108 GiB  ok                
 e4d1s11                 2        DA2             110 GiB  ok                
 e4d1s12                 2        DA3             110 GiB  ok                
 e4d2s07                 2        DA1             106 GiB  ok                
 e4d2s08                 2        DA2             110 GiB  ok                
 e4d2s09                 2        DA3             110 GiB  ok                
 e4d2s10                 2        DA1             106 GiB  ok                
 e4d2s11                 2        DA2             110 GiB  ok                
 e4d2s12                 2        DA3             110 GiB  ok                
 e4d3s07                 2        DA1             106 GiB  ok                
 e4d3s08                 2        DA2             110 GiB  ok                
 e4d3s09                 2        DA3             110 GiB  ok                
 e4d3s10                 2        DA1             106 GiB  ok                
 e4d3s11                 2        DA2             110 GiB  ok                
 e4d3s12                 2        DA3             110 GiB  ok                
 e4d4s07                 2        DA1             106 GiB  ok                
 e4d4s08                 2        DA2             110 GiB  ok                
 e4d4s09                 2        DA3             110 GiB  ok                
 e4d4s10                 2        DA1             108 GiB  ok                
 e4d4s11                 2        DA2             110 GiB  ok                
 e4d4s12                 2        DA3             110 GiB  ok                
 e4d5s07                 2        DA1             106 GiB  ok                
 e4d5s08                 2        DA2             110 GiB  ok                
 e4d5s09                 2        DA3             110 GiB  ok                
 e4d5s10                 2        DA1             106 GiB  ok                
 e4d5s11                 2        DA3             110 GiB  ok                
 e5d1s07                 2        DA1             106 GiB  ok                
 e5d1s08                 2        DA2             110 GiB  ok                
 e5d1s09                 2        DA3             110 GiB  ok                
 e5d1s10                 2        DA1             106 GiB  ok                
 e5d1s11                 2        DA2             110 GiB  ok                
 e5d1s12                 2        DA3             110 GiB  ok                
 e5d2s07                 2        DA1             106 GiB  ok                
 e5d2s08                 2        DA2             110 GiB  ok                
 e5d2s09                 2        DA3             110 GiB  ok                
 e5d2s10                 2        DA1             106 GiB  ok                
 e5d2s11                 2        DA2             110 GiB  ok                
 e5d2s12                 2        DA3             110 GiB  ok                
 e5d3s07                 2        DA1             106 GiB  ok                
 e5d3s08                 2        DA2             110 GiB  ok                
 e5d3s09                 2        DA3             110 GiB  ok                
 e5d3s10                 2        DA1             106 GiB  ok                
 e5d3s11                 2        DA2             110 GiB  ok                
 e5d3s12                 2        DA3             110 GiB  ok                
 e5d4s07                 2        DA1             106 GiB  ok                
 e5d4s08                 2        DA2             110 GiB  ok                
 e5d4s09                 2        DA3             110 GiB  ok                
 e5d4s10                 2        DA1             106 GiB  ok                
 e5d4s11                 2        DA2             110 GiB  ok                
 e5d4s12                 2        DA3             110 GiB  ok                
 e5d5s07                 2        DA1             106 GiB  ok                
 e5d5s08                 2        DA2             110 GiB  ok                
 e5d5s09                 2        DA3             110 GiB  ok                
 e5d5s10                 2        DA1             106 GiB  ok                
 e5d5s11                 2        DA2             110 GiB  ok                
 e6d1s07                 2        DA1             106 GiB  ok                
 e6d1s08                 2        DA2             110 GiB  ok                
 e6d1s09                 2        DA3             110 GiB  ok                
 e6d1s10                 2        DA1             106 GiB  ok                
 e6d1s11                 2        DA2             110 GiB  ok                
 e6d1s12                 2        DA3             110 GiB  ok                
 e6d2s07                 2        DA1             106 GiB  ok                
 e6d2s08                 2        DA2             110 GiB  ok                
 e6d2s09                 2        DA3             108 GiB  ok                
 e6d2s10                 2        DA1             106 GiB  ok                
 e6d2s11                 2        DA2             108 GiB  ok                
 e6d2s12                 2        DA3             108 GiB  ok                
 e6d3s07                 2        DA1             106 GiB  ok                
 e6d3s08                 2        DA2             108 GiB  ok                
 e6d3s09                 2        DA3             108 GiB  ok                
 e6d3s10                 2        DA1             106 GiB  ok                
 e6d3s11                 2        DA2             108 GiB  ok                
 e6d3s12                 2        DA3             108 GiB  ok                
 e6d4s07                 2        DA1             106 GiB  ok                
 e6d4s08                 2        DA2             108 GiB  ok                
 e6d4s09                 2        DA3             108 GiB  ok                
 e6d4s10                 2        DA1             106 GiB  ok                
 e6d4s11                 2        DA2             108 GiB  ok                
 e6d4s12                 2        DA3             110 GiB  ok                
 e6d5s07                 2        DA1             106 GiB  ok                
 e6d5s08                 2        DA2             110 GiB  ok                
 e6d5s09                 2        DA3             110 GiB  ok                
 e6d5s10                 2        DA1             106 GiB  ok                
 e6d5s11                 2        DA2             110 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss02b_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss02b_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss02b_MetaData_8M_3p_1  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss02b_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss02b_MetaData_8M_3p_3  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss02b_Data_8M_3p_1  8+3p                DA1              99 TiB      8 MiB     32 KiB             
 gss02b_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss02b_Data_8M_3p_3  8+3p                DA3              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss02b.ebi.ac.uk                                 gss02b.ebi.ac.uk,gss02a.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss03a                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       36%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       18%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       19%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub        4%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s01                 2        DA3             110 GiB  ok                
 e1d1s02                 2        DA2             110 GiB  ok                
 e1d1s03log              2        LOG             186 GiB  ok                
 e1d1s04                 2        DA1             108 GiB  ok                
 e1d1s05                 2        DA2             110 GiB  ok                
 e1d1s06                 2        DA3             110 GiB  ok                
 e1d2s01                 2        DA1             108 GiB  ok                
 e1d2s02                 2        DA2             110 GiB  ok                
 e1d2s03                 2        DA3             110 GiB  ok                
 e1d2s04                 2        DA1             108 GiB  ok                
 e1d2s05                 2        DA2             110 GiB  ok                
 e1d2s06                 2        DA3             110 GiB  ok                
 e1d3s01                 2        DA1             108 GiB  ok                
 e1d3s02                 2        DA2             110 GiB  ok                
 e1d3s03                 2        DA3             110 GiB  ok                
 e1d3s04                 2        DA1             108 GiB  ok                
 e1d3s05                 2        DA2             110 GiB  ok                
 e1d3s06                 2        DA3             110 GiB  ok                
 e1d4s01                 2        DA1             108 GiB  ok                
 e1d4s02                 2        DA2             110 GiB  ok                
 e1d4s03                 2        DA3             110 GiB  ok                
 e1d4s04                 2        DA1             108 GiB  ok                
 e1d4s05                 2        DA2             110 GiB  ok                
 e1d4s06                 2        DA3             110 GiB  ok                
 e1d5s01                 2        DA1             108 GiB  ok                
 e1d5s02                 2        DA2             110 GiB  ok                
 e1d5s03                 2        DA3             110 GiB  ok                
 e1d5s04                 2        DA1             108 GiB  ok                
 e1d5s05                 2        DA2             110 GiB  ok                
 e1d5s06                 2        DA3             110 GiB  ok                
 e2d1s01                 2        DA3             110 GiB  ok                
 e2d1s02                 2        DA2             110 GiB  ok                
 e2d1s03log              2        LOG             186 GiB  ok                
 e2d1s04                 2        DA1             108 GiB  ok                
 e2d1s05                 2        DA2             110 GiB  ok                
 e2d1s06                 2        DA3             110 GiB  ok                
 e2d2s01                 2        DA1             108 GiB  ok                
 e2d2s02                 2        DA2             110 GiB  ok                
 e2d2s03                 2        DA3             110 GiB  ok                
 e2d2s04                 2        DA1             108 GiB  ok                
 e2d2s05                 2        DA2             110 GiB  ok                
 e2d2s06                 2        DA3             110 GiB  ok                
 e2d3s01                 2        DA1             108 GiB  ok                
 e2d3s02                 2        DA2             110 GiB  ok                
 e2d3s03                 2        DA3             110 GiB  ok                
 e2d3s04                 2        DA1             108 GiB  ok                
 e2d3s05                 2        DA2             110 GiB  ok                
 e2d3s06                 2        DA3             110 GiB  ok                
 e2d4s01                 2        DA1             108 GiB  ok                
 e2d4s02                 2        DA2             110 GiB  ok                
 e2d4s03                 2        DA3             110 GiB  ok                
 e2d4s04                 2        DA1             108 GiB  ok                
 e2d4s05                 2        DA2             110 GiB  ok                
 e2d4s06                 2        DA3             110 GiB  ok                
 e2d5s01                 2        DA1             108 GiB  ok                
 e2d5s02                 2        DA2             110 GiB  ok                
 e2d5s03                 2        DA3             110 GiB  ok                
 e2d5s04                 2        DA1             108 GiB  ok                
 e2d5s05                 2        DA2             110 GiB  ok                
 e2d5s06                 2        DA3             110 GiB  ok                
 e3d1s01                 2        DA1             108 GiB  ok                
 e3d1s02                 2        DA3             110 GiB  ok                
 e3d1s03log              2        LOG             186 GiB  ok                
 e3d1s04                 2        DA1             108 GiB  ok                
 e3d1s05                 2        DA2             110 GiB  ok                
 e3d1s06                 2        DA3             110 GiB  ok                
 e3d2s01                 2        DA1             108 GiB  ok                
 e3d2s02                 2        DA2             110 GiB  ok                
 e3d2s03                 2        DA3             110 GiB  ok                
 e3d2s04                 2        DA1             108 GiB  ok                
 e3d2s05                 2        DA2             110 GiB  ok                
 e3d2s06                 2        DA3             110 GiB  ok                
 e3d3s01                 2        DA1             108 GiB  ok                
 e3d3s02                 2        DA2             110 GiB  ok                
 e3d3s03                 2        DA3             110 GiB  ok                
 e3d3s04                 2        DA1             108 GiB  ok                
 e3d3s05                 2        DA2             110 GiB  ok                
 e3d3s06                 2        DA3             110 GiB  ok                
 e3d4s01                 2        DA1             108 GiB  ok                
 e3d4s02                 2        DA2             110 GiB  ok                
 e3d4s03                 2        DA3             110 GiB  ok                
 e3d4s04                 2        DA1             108 GiB  ok                
 e3d4s05                 2        DA2             110 GiB  ok                
 e3d4s06                 2        DA3             110 GiB  ok                
 e3d5s01                 2        DA1             108 GiB  ok                
 e3d5s02                 2        DA2             110 GiB  ok                
 e3d5s03                 2        DA3             110 GiB  ok                
 e3d5s04                 2        DA1             108 GiB  ok                
 e3d5s05                 2        DA2             110 GiB  ok                
 e3d5s06                 2        DA3             110 GiB  ok                
 e4d1s01                 2        DA1             108 GiB  ok                
 e4d1s02                 2        DA3             110 GiB  ok                
 e4d1s04                 2        DA1             108 GiB  ok                
 e4d1s05                 2        DA2             110 GiB  ok                
 e4d1s06                 2        DA3             110 GiB  ok                
 e4d2s01                 2        DA1             108 GiB  ok                
 e4d2s02                 2        DA2             110 GiB  ok                
 e4d2s03                 2        DA3             110 GiB  ok                
 e4d2s04                 2        DA1             106 GiB  ok                
 e4d2s05                 2        DA2             110 GiB  ok                
 e4d2s06                 2        DA3             110 GiB  ok                
 e4d3s01                 2        DA1             106 GiB  ok                
 e4d3s02                 2        DA2             110 GiB  ok                
 e4d3s03                 2        DA3             110 GiB  ok                
 e4d3s04                 2        DA1             106 GiB  ok                
 e4d3s05                 2        DA2             110 GiB  ok                
 e4d3s06                 2        DA3             110 GiB  ok                
 e4d4s01                 2        DA1             106 GiB  ok                
 e4d4s02                 2        DA2             110 GiB  ok                
 e4d4s03                 2        DA3             110 GiB  ok                
 e4d4s04                 2        DA1             106 GiB  ok                
 e4d4s05                 2        DA2             110 GiB  ok                
 e4d4s06                 2        DA3             110 GiB  ok                
 e4d5s01                 2        DA1             106 GiB  ok                
 e4d5s02                 2        DA2             110 GiB  ok                
 e4d5s03                 2        DA3             110 GiB  ok                
 e4d5s04                 2        DA1             106 GiB  ok                
 e4d5s05                 2        DA2             110 GiB  ok                
 e4d5s06                 2        DA3             110 GiB  ok                
 e5d1s01                 2        DA1             106 GiB  ok                
 e5d1s02                 2        DA2             110 GiB  ok                
 e5d1s04                 2        DA1             106 GiB  ok                
 e5d1s05                 2        DA2             110 GiB  ok                
 e5d1s06                 2        DA3             110 GiB  ok                
 e5d2s01                 2        DA1             106 GiB  ok                
 e5d2s02                 2        DA2             110 GiB  ok                
 e5d2s03                 2        DA3             110 GiB  ok                
 e5d2s04                 2        DA1             106 GiB  ok                
 e5d2s05                 2        DA2             110 GiB  ok                
 e5d2s06                 2        DA3             110 GiB  ok                
 e5d3s01                 2        DA1             106 GiB  ok                
 e5d3s02                 2        DA2             110 GiB  ok                
 e5d3s03                 2        DA3             110 GiB  ok                
 e5d3s04                 2        DA1             106 GiB  ok                
 e5d3s05                 2        DA2             110 GiB  ok                
 e5d3s06                 2        DA3             110 GiB  ok                
 e5d4s01                 2        DA1             106 GiB  ok                
 e5d4s02                 2        DA2             110 GiB  ok                
 e5d4s03                 2        DA3             110 GiB  ok                
 e5d4s04                 2        DA1             106 GiB  ok                
 e5d4s05                 2        DA2             110 GiB  ok                
 e5d4s06                 2        DA3             110 GiB  ok                
 e5d5s01                 2        DA1             106 GiB  ok                
 e5d5s02                 2        DA2             110 GiB  ok                
 e5d5s03                 2        DA3             110 GiB  ok                
 e5d5s04                 2        DA1             106 GiB  ok                
 e5d5s05                 2        DA2             110 GiB  ok                
 e5d5s06                 2        DA3             110 GiB  ok                
 e6d1s01                 2        DA1             106 GiB  ok                
 e6d1s02                 2        DA2             110 GiB  ok                
 e6d1s04                 2        DA1             106 GiB  ok                
 e6d1s05                 2        DA2             110 GiB  ok                
 e6d1s06                 2        DA3             110 GiB  ok                
 e6d2s01                 2        DA1             106 GiB  ok                
 e6d2s02                 2        DA2             110 GiB  ok                
 e6d2s03                 2        DA3             110 GiB  ok                
 e6d2s04                 2        DA1             106 GiB  ok                
 e6d2s05                 2        DA2             108 GiB  ok                
 e6d2s06                 2        DA3             108 GiB  ok                
 e6d3s01                 2        DA1             106 GiB  ok                
 e6d3s02                 2        DA2             108 GiB  ok                
 e6d3s03                 2        DA3             108 GiB  ok                
 e6d3s04                 2        DA1             106 GiB  ok                
 e6d3s05                 2        DA2             108 GiB  ok                
 e6d3s06                 2        DA3             108 GiB  ok                
 e6d4s01                 2        DA1             106 GiB  ok                
 e6d4s02                 2        DA2             108 GiB  ok                
 e6d4s03                 2        DA3             108 GiB  ok                
 e6d4s04                 2        DA1             106 GiB  ok                
 e6d4s05                 2        DA2             108 GiB  ok                
 e6d4s06                 2        DA3             108 GiB  ok                
 e6d5s01                 2        DA1             106 GiB  ok                
 e6d5s02                 2        DA2             110 GiB  ok                
 e6d5s03                 2        DA3             110 GiB  ok                
 e6d5s04                 2        DA1             106 GiB  ok                
 e6d5s05                 2        DA2             110 GiB  ok                
 e6d5s06                 2        DA3             110 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss03a_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss03a_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss03a_MetaData_8M_3p_1  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss03a_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss03a_MetaData_8M_3p_3  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss03a_Data_8M_3p_1  8+3p                DA3              99 TiB      8 MiB     32 KiB             
 gss03a_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss03a_Data_8M_3p_3  8+3p                DA1              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss03a.ebi.ac.uk                                 gss03a.ebi.ac.uk,gss03b.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss03b                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       38%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       12%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       20%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       19%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s07                 2        DA1             108 GiB  ok                
 e1d1s08                 2        DA2             110 GiB  ok                
 e1d1s09                 2        DA3             110 GiB  ok                
 e1d1s10                 2        DA1             108 GiB  ok                
 e1d1s11                 2        DA2             110 GiB  ok                
 e1d1s12                 2        DA3             110 GiB  ok                
 e1d2s07                 2        DA1             108 GiB  ok                
 e1d2s08                 2        DA2             110 GiB  ok                
 e1d2s09                 2        DA3             110 GiB  ok                
 e1d2s10                 2        DA1             108 GiB  ok                
 e1d2s11                 2        DA2             110 GiB  ok                
 e1d2s12                 2        DA3             110 GiB  ok                
 e1d3s07                 2        DA1             108 GiB  ok                
 e1d3s08                 2        DA2             110 GiB  ok                
 e1d3s09                 2        DA3             110 GiB  ok                
 e1d3s10                 2        DA1             108 GiB  ok                
 e1d3s11                 2        DA2             110 GiB  ok                
 e1d3s12                 2        DA3             110 GiB  ok                
 e1d4s07                 2        DA1             108 GiB  ok                
 e1d4s08                 2        DA2             110 GiB  ok                
 e1d4s09                 2        DA3             110 GiB  ok                
 e1d4s10                 2        DA1             108 GiB  ok                
 e1d4s11                 2        DA2             110 GiB  ok                
 e1d4s12                 2        DA3             110 GiB  ok                
 e1d5s07                 2        DA1             108 GiB  ok                
 e1d5s08                 2        DA2             110 GiB  ok                
 e1d5s09                 2        DA3             110 GiB  ok                
 e1d5s10                 2        DA3             110 GiB  ok                
 e1d5s11                 2        DA2             110 GiB  ok                
 e1d5s12log              2        LOG             186 GiB  ok                
 e2d1s07                 2        DA1             108 GiB  ok                
 e2d1s08                 2        DA2             110 GiB  ok                
 e2d1s09                 2        DA3             110 GiB  ok                
 e2d1s10                 2        DA1             106 GiB  ok                
 e2d1s11                 2        DA2             110 GiB  ok                
 e2d1s12                 2        DA3             110 GiB  ok                
 e2d2s07                 2        DA1             106 GiB  ok                
 e2d2s08                 2        DA2             110 GiB  ok                
 e2d2s09                 2        DA3             110 GiB  ok                
 e2d2s10                 2        DA1             106 GiB  ok                
 e2d2s11                 2        DA2             110 GiB  ok                
 e2d2s12                 2        DA3             110 GiB  ok                
 e2d3s07                 2        DA1             106 GiB  ok                
 e2d3s08                 2        DA2             110 GiB  ok                
 e2d3s09                 2        DA3             110 GiB  ok                
 e2d3s10                 2        DA1             106 GiB  ok                
 e2d3s11                 2        DA2             110 GiB  ok                
 e2d3s12                 2        DA3             110 GiB  ok                
 e2d4s07                 2        DA1             106 GiB  ok                
 e2d4s08                 2        DA2             110 GiB  ok                
 e2d4s09                 2        DA3             110 GiB  ok                
 e2d4s10                 2        DA1             108 GiB  ok                
 e2d4s11                 2        DA2             110 GiB  ok                
 e2d4s12                 2        DA3             110 GiB  ok                
 e2d5s07                 2        DA1             108 GiB  ok                
 e2d5s08                 2        DA2             110 GiB  ok                
 e2d5s09                 2        DA3             110 GiB  ok                
 e2d5s10                 2        DA3             110 GiB  ok                
 e2d5s11                 2        DA2             110 GiB  ok                
 e2d5s12log              2        LOG             186 GiB  ok                
 e3d1s07                 2        DA1             108 GiB  ok                
 e3d1s08                 2        DA2             110 GiB  ok                
 e3d1s09                 2        DA3             110 GiB  ok                
 e3d1s10                 2        DA1             106 GiB  ok                
 e3d1s11                 2        DA2             110 GiB  ok                
 e3d1s12                 2        DA3             110 GiB  ok                
 e3d2s07                 2        DA1             106 GiB  ok                
 e3d2s08                 2        DA2             110 GiB  ok                
 e3d2s09                 2        DA3             110 GiB  ok                
 e3d2s10                 2        DA1             108 GiB  ok                
 e3d2s11                 2        DA2             110 GiB  ok                
 e3d2s12                 2        DA3             110 GiB  ok                
 e3d3s07                 2        DA1             106 GiB  ok                
 e3d3s08                 2        DA2             110 GiB  ok                
 e3d3s09                 2        DA3             110 GiB  ok                
 e3d3s10                 2        DA1             106 GiB  ok                
 e3d3s11                 2        DA2             110 GiB  ok                
 e3d3s12                 2        DA3             110 GiB  ok                
 e3d4s07                 2        DA1             106 GiB  ok                
 e3d4s08                 2        DA2             110 GiB  ok                
 e3d4s09                 2        DA3             110 GiB  ok                
 e3d4s10                 2        DA1             108 GiB  ok                
 e3d4s11                 2        DA2             110 GiB  ok                
 e3d4s12                 2        DA3             110 GiB  ok                
 e3d5s07                 2        DA1             108 GiB  ok                
 e3d5s08                 2        DA2             110 GiB  ok                
 e3d5s09                 2        DA3             110 GiB  ok                
 e3d5s10                 2        DA1             106 GiB  ok                
 e3d5s11                 2        DA3             110 GiB  ok                
 e3d5s12log              2        LOG             186 GiB  ok                
 e4d1s07                 2        DA1             106 GiB  ok                
 e4d1s08                 2        DA2             110 GiB  ok                
 e4d1s09                 2        DA3             110 GiB  ok                
 e4d1s10                 2        DA1             106 GiB  ok                
 e4d1s11                 2        DA2             110 GiB  ok                
 e4d1s12                 2        DA3             110 GiB  ok                
 e4d2s07                 2        DA1             106 GiB  ok                
 e4d2s08                 2        DA2             110 GiB  ok                
 e4d2s09                 2        DA3             110 GiB  ok                
 e4d2s10                 2        DA1             106 GiB  ok                
 e4d2s11                 2        DA2             110 GiB  ok                
 e4d2s12                 2        DA3             110 GiB  ok                
 e4d3s07                 2        DA1             108 GiB  ok                
 e4d3s08                 2        DA2             110 GiB  ok                
 e4d3s09                 2        DA3             110 GiB  ok                
 e4d3s10                 2        DA1             108 GiB  ok                
 e4d3s11                 2        DA2             110 GiB  ok                
 e4d3s12                 2        DA3             110 GiB  ok                
 e4d4s07                 2        DA1             106 GiB  ok                
 e4d4s08                 2        DA2             110 GiB  ok                
 e4d4s09                 2        DA3             110 GiB  ok                
 e4d4s10                 2        DA1             106 GiB  ok                
 e4d4s11                 2        DA2             110 GiB  ok                
 e4d4s12                 2        DA3             110 GiB  ok                
 e4d5s07                 2        DA1             106 GiB  ok                
 e4d5s08                 2        DA2             110 GiB  ok                
 e4d5s09                 2        DA3             110 GiB  ok                
 e4d5s10                 2        DA1             106 GiB  ok                
 e4d5s11                 2        DA3             110 GiB  ok                
 e5d1s07                 2        DA1             108 GiB  ok                
 e5d1s08                 2        DA2             110 GiB  ok                
 e5d1s09                 2        DA3             110 GiB  ok                
 e5d1s10                 2        DA1             106 GiB  ok                
 e5d1s11                 2        DA2             110 GiB  ok                
 e5d1s12                 2        DA3             110 GiB  ok                
 e5d2s07                 2        DA1             108 GiB  ok                
 e5d2s08                 2        DA2             110 GiB  ok                
 e5d2s09                 2        DA3             110 GiB  ok                
 e5d2s10                 2        DA1             108 GiB  ok                
 e5d2s11                 2        DA2             110 GiB  ok                
 e5d2s12                 2        DA3             110 GiB  ok                
 e5d3s07                 2        DA1             108 GiB  ok                
 e5d3s08                 2        DA2             110 GiB  ok                
 e5d3s09                 2        DA3             110 GiB  ok                
 e5d3s10                 2        DA1             106 GiB  ok                
 e5d3s11                 2        DA2             110 GiB  ok                
 e5d3s12                 2        DA3             110 GiB  ok                
 e5d4s07                 2        DA1             108 GiB  ok                
 e5d4s08                 2        DA2             110 GiB  ok                
 e5d4s09                 2        DA3             110 GiB  ok                
 e5d4s10                 2        DA1             108 GiB  ok                
 e5d4s11                 2        DA2             110 GiB  ok                
 e5d4s12                 2        DA3             110 GiB  ok                
 e5d5s07                 2        DA1             108 GiB  ok                
 e5d5s08                 2        DA2             110 GiB  ok                
 e5d5s09                 2        DA3             110 GiB  ok                
 e5d5s10                 2        DA1             106 GiB  ok                
 e5d5s11                 2        DA2             110 GiB  ok                
 e6d1s07                 2        DA1             108 GiB  ok                
 e6d1s08                 2        DA2             110 GiB  ok                
 e6d1s09                 2        DA3             110 GiB  ok                
 e6d1s10                 2        DA1             108 GiB  ok                
 e6d1s11                 2        DA2             110 GiB  ok                
 e6d1s12                 2        DA3             110 GiB  ok                
 e6d2s07                 2        DA1             106 GiB  ok                
 e6d2s08                 2        DA2             110 GiB  ok                
 e6d2s09                 2        DA3             108 GiB  ok                
 e6d2s10                 2        DA1             108 GiB  ok                
 e6d2s11                 2        DA2             108 GiB  ok                
 e6d2s12                 2        DA3             108 GiB  ok                
 e6d3s07                 2        DA1             106 GiB  ok                
 e6d3s08                 2        DA2             108 GiB  ok                
 e6d3s09                 2        DA3             108 GiB  ok                
 e6d3s10                 2        DA1             106 GiB  ok                
 e6d3s11                 2        DA2             108 GiB  ok                
 e6d3s12                 2        DA3             108 GiB  ok                
 e6d4s07                 2        DA1             106 GiB  ok                
 e6d4s08                 2        DA2             108 GiB  ok                
 e6d4s09                 2        DA3             108 GiB  ok                
 e6d4s10                 2        DA1             108 GiB  ok                
 e6d4s11                 2        DA2             108 GiB  ok                
 e6d4s12                 2        DA3             110 GiB  ok                
 e6d5s07                 2        DA1             108 GiB  ok                
 e6d5s08                 2        DA2             110 GiB  ok                
 e6d5s09                 2        DA3             110 GiB  ok                
 e6d5s10                 2        DA1             108 GiB  ok                
 e6d5s11                 2        DA2             110 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss03b_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss03b_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss03b_MetaData_8M_3p_1  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss03b_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss03b_MetaData_8M_3p_3  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss03b_Data_8M_3p_1  8+3p                DA1              99 TiB      8 MiB     32 KiB             
 gss03b_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss03b_Data_8M_3p_3  8+3p                DA3              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss03b.ebi.ac.uk                                 gss03b.ebi.ac.uk,gss03a.ebi.ac.uk

-------------- next part --------------

=== mmdiag: config ===
   allowDeleteAclOnChmod 1
   assertOnStructureError 0
   atimeDeferredSeconds 86400
 ! cipherList AUTHONLY
 ! clusterId 17987981184946329605
 ! clusterName GSS.ebi.ac.uk
   consoleLogEvents 0
   dataStructureDump 1 /tmp/mmfs
   dataStructureDumpOnRGOpenFailed 0 /tmp/mmfs
   dataStructureDumpOnSGPanic 0 /tmp/mmfs
   dataStructureDumpWait 60
   dbBlockSizeThreshold -1
   distributedTokenServer 1
   dmapiAllowMountOnWindows 1
   dmapiDataEventRetry 2
   dmapiEnable 1
   dmapiEventBuffers 64
   dmapiEventTimeout -1
 ! dmapiFileHandleSize 32
   dmapiMountEvent all
   dmapiMountTimeout 60
   dmapiSessionFailureTimeout 0
   dmapiWorkerThreads 12
   enableIPv6 0
   enableLowspaceEvents 0
   enableNFSCluster 0
   enableStatUIDremap 0
   enableTreeBasedQuotas 0
   enableUIDremap 0
   encryptionCryptoEngineLibName (NULL)
   encryptionCryptoEngineType CLiC
   enforceFilesetQuotaOnRoot 0
   envVar 
 ! failureDetectionTime 60
   fgdlActivityTimeWindow 10
   fgdlLeaveThreshold 1000
   fineGrainDirLocks 1
   FIPS1402mode 0
   FleaDisableIntegrityChecks 0
   FleaNumAsyncIOThreads 2
   FleaNumLEBBuffers 256
   FleaPreferredStripSize 0
 ! flushedDataTarget 1024
 ! flushedInodeTarget 1024
   healthCheckInterval 10
   idleSocketTimeout 3600
   ignorePrefetchLUNCount 0
   ignoreReplicaSpaceOnStat 0
   ignoreReplicationForQuota 0
   ignoreReplicationOnStatfs 0
 ! ioHistorySize 65536
   iscanPrefetchAggressiveness 2
   leaseDMSTimeout -1
   leaseDuration -1
   leaseRecoveryWait 35
 ! logBufferCount 20
 ! logWrapAmountPct 2
 ! logWrapThreads 128
   lrocChecksum 0
   lrocData 1
   lrocDataMaxBufferSize 32768
   lrocDataMaxFileSize 32768
   lrocDataStubFileSize 0
   lrocDeviceMaxSectorsKB 64
   lrocDeviceNrRequests 1024
   lrocDeviceQueueDepth 31
   lrocDevices 
   lrocDeviceScheduler deadline
   lrocDeviceSetParams 1
   lrocDirectories 1
   lrocInodes 1
 ! maxAllocRegionsPerNode 32
 ! maxBackgroundDeletionThreads 16
 ! maxblocksize 16777216
 ! maxBufferCleaners 1024
 ! maxBufferDescs 2097152
   maxDiskAddrBuffs -1
   maxFcntlRangesPerFile 200
 ! maxFileCleaners 1024
   maxFileNameBytes 255
 ! maxFilesToCache 12288
 ! maxGeneralThreads 1280
 ! maxInodeDeallocPrefetch 128
 ! maxMBpS 16000
   maxMissedPingTimeout 60
 ! maxReceiverThreads 128
 ! maxStatCache 512
   maxTokenServers 128
   minMissedPingTimeout 3
   minQuorumNodes 1
 ! minReleaseLevel 1340
 ! myNodeConfigNumber 5
   noSpaceEventInterval 120
   nsdBufSpace (% of PagePool)  30
 ! nsdClientCksumTypeLocal NsdCksum_Ck64
 ! nsdClientCksumTypeRemote NsdCksum_Ck64
   nsdDumpBuffersOnCksumError 0 nsd_cksum_capture
 ! nsdInlineWriteMax 32768
 ! nsdMaxWorkerThreads 3072
 ! nsdMinWorkerThreads 3072
   nsdMultiQueue 256
   nsdRAIDAllowTraditionalNSD 0
   nsdRAIDAULogColocationLimit 131072
   nsdRAIDBackgroundMinPct 5
 ! nsdRAIDBlockDeviceMaxSectorsKB 4096
 ! nsdRAIDBlockDeviceNrRequests 32
 ! nsdRAIDBlockDeviceQueueDepth 16
 ! nsdRAIDBlockDeviceScheduler deadline
 ! nsdRAIDBufferPoolSizePct (% of PagePool) 80
   nsdRAIDBuffersPromotionThresholdPct 50
   nsdRAIDCreateVdiskThreads 8
   nsdRAIDDiskDiscoveryInterval 180
 ! nsdRAIDEventLogToConsole all
 ! nsdRAIDFastWriteFSDataLimit 65536
 ! nsdRAIDFastWriteFSMetadataLimit 262144
 ! nsdRAIDFlusherBuffersLimitPct 80
 ! nsdRAIDFlusherBuffersLowWatermarkPct 20
 ! nsdRAIDFlusherFWLogHighWatermarkMB 1000
 ! nsdRAIDFlusherFWLogLimitMB 5000
 ! nsdRAIDFlusherThreadsHighWatermark 512
 ! nsdRAIDFlusherThreadsLowWatermark 1
 ! nsdRAIDFlusherTracksLimitPct 80
 ! nsdRAIDFlusherTracksLowWatermarkPct 20
   nsdRAIDForegroundMinPct 15
 ! nsdRAIDMaxTransientStale2FT 1
 ! nsdRAIDMaxTransientStale3FT 1
   nsdRAIDMediumWriteLimitPct 50
   nsdRAIDMultiQueue -1
 ! nsdRAIDReconstructAggressiveness 1
 ! nsdRAIDSmallBufferSize 262144
 ! nsdRAIDSmallThreadRatio 2
 ! nsdRAIDThreadsPerQueue 16
 ! nsdRAIDTracks 131072
 ! numaMemoryInterleave yes
   opensslLibName /usr/lib64/libssl.so.10:/usr/lib64/libssl.so.6:/usr/lib64/libssl.so.0.9.8:/lib64/libssl.so.6:libssl.so:libssl.so.0:libssl.so.4
 ! pagepool 40802189312
   pagepoolMaxPhysMemPct 75
   prefetchAggressiveness 2
   prefetchAggressivenessRead -1
   prefetchAggressivenessWrite -1
 ! prefetchPct 5
   prefetchThreads 72
   readReplicaPolicy default
   remoteMountTimeout 10
   sharedMemLimit 0
   sharedMemReservePct 15
   sidAutoMapRangeLength 15000000
   sidAutoMapRangeStart 15000000
 ! socketMaxListenConnections 1500
   socketRcvBufferSize 0
   socketSndBufferSize 0
   statCacheDirPct 10
   subnets 
 ! syncWorkerThreads 256
   tiebreaker system
   tiebreakerDisks 
   tokenMemLimit 536870912
   treatOSyncLikeODSync 1
   tscTcpPort 1191
 ! tscWorkerPool 64
   uidDomain GSS.ebi.ac.uk
   uidExpiration 36000
   unmountOnDiskFail no
   useDIOXW 1
   usePersistentReserve 0
   verbsLibName libibverbs.so
   verbsPorts 
   verbsRdma disable
   verbsRdmaCm disable
   verbsRdmaCmLibName librdmacm.so
   verbsRdmaMaxSendBytes 16777216
   verbsRdmaMinBytes 8192
   verbsRdmaQpRtrMinRnrTimer 18
   verbsRdmaQpRtrPathMtu 2048
   verbsRdmaQpRtrSl 0
   verbsRdmaQpRtrSlDynamic 0
   verbsRdmaQpRtrSlDynamicTimeout 10
   verbsRdmaQpRtsRetryCnt 6
   verbsRdmaQpRtsRnrRetry 6
   verbsRdmaQpRtsTimeout 18
   verbsRdmaSend 0
   verbsRdmasPerConnection 8
   verbsRdmasPerNode 0
   verbsRdmaTimeout 18
   verifyGpfsReady 0
 ! worker1Threads 1024
 ! worker3Threads 32
   writebehindThreshold 524288

From oehmes at us.ibm.com  Tue Oct 14 18:23:50 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 10:23:50 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <543D51B6.3070602@ebi.ac.uk>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
	<543D51B6.3070602@ebi.ac.uk>
Message-ID: <OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>

you basically run GSS 1.0 code , while in the current version is GSS 2.0 
(which replaced Version 1.5 2 month ago) 

GSS 1.5 and 2.0 have several enhancements in this space so i strongly 
encourage you to upgrade your systems. 

if you can specify a bit what your workload is there might also be 
additional knobs we can turn to change the behavior. 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------

gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM:

> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
> Date: 10/14/2014 09:40 AM
> Subject: Re: [gpfsug-discuss] wait for permission to append to log
> Sent by: gpfsug-discuss-bounces at gpfsug.org
> 
> Thanks in advance for your help.
> 
> We have 6 RG:

>  recovery group        vdisks     vdisks  servers
>  ------------------  -----------  ------  -------
>  gss01a                        4       8  
gss01a.ebi.ac.uk,gss01b.ebi.ac.uk 
>  gss01b                        4       8  
gss01b.ebi.ac.uk,gss01a.ebi.ac.uk 
>  gss02a                        4       8  
gss02a.ebi.ac.uk,gss02b.ebi.ac.uk 
>  gss02b                        4       8  
gss02b.ebi.ac.uk,gss02a.ebi.ac.uk 
>  gss03a                        4       8  
gss03a.ebi.ac.uk,gss03b.ebi.ac.uk 
>  gss03b                        4       8  
gss03b.ebi.ac.uk,gss03a.ebi.ac.uk 
> 
> Check the attached file for RG details. 
> Following mmlsconfig:

> [root at gss01a ~]# mmlsconfig
> Configuration data for cluster GSS.ebi.ac.uk:
> ---------------------------------------------
> myNodeConfigNumber 1
> clusterName GSS.ebi.ac.uk
> clusterId 17987981184946329605
> autoload no
> dmapiFileHandleSize 32
> minReleaseLevel 3.5.0.11
> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
> pagepool 38g
> nsdRAIDBufferPoolSizePct 80
> maxBufferDescs 2m
> numaMemoryInterleave yes
> prefetchPct 5
> maxblocksize 16m
> nsdRAIDTracks 128k
> ioHistorySize 64k
> nsdRAIDSmallBufferSize 256k
> nsdMaxWorkerThreads 3k
> nsdMinWorkerThreads 3k
> nsdRAIDSmallThreadRatio 2
> nsdRAIDThreadsPerQueue 16
> nsdClientCksumTypeLocal ck64
> nsdClientCksumTypeRemote ck64
> nsdRAIDEventLogToConsole all
> nsdRAIDFastWriteFSDataLimit 64k
> nsdRAIDFastWriteFSMetadataLimit 256k
> nsdRAIDReconstructAggressiveness 1
> nsdRAIDFlusherBuffersLowWatermarkPct 20
> nsdRAIDFlusherBuffersLimitPct 80
> nsdRAIDFlusherTracksLowWatermarkPct 20
> nsdRAIDFlusherTracksLimitPct 80
> nsdRAIDFlusherFWLogHighWatermarkMB 1000
> nsdRAIDFlusherFWLogLimitMB 5000
> nsdRAIDFlusherThreadsLowWatermark 1
> nsdRAIDFlusherThreadsHighWatermark 512
> nsdRAIDBlockDeviceMaxSectorsKB 4096
> nsdRAIDBlockDeviceNrRequests 32
> nsdRAIDBlockDeviceQueueDepth 16
> nsdRAIDBlockDeviceScheduler deadline
> nsdRAIDMaxTransientStale2FT 1
> nsdRAIDMaxTransientStale3FT 1
> syncWorkerThreads 256
> tscWorkerPool 64
> nsdInlineWriteMax 32k
> maxFilesToCache 12k
> maxStatCache 512
> maxGeneralThreads 1280
> flushedDataTarget 1024
> flushedInodeTarget 1024
> maxFileCleaners 1024
> maxBufferCleaners 1024
> logBufferCount 20
> logWrapAmountPct 2
> logWrapThreads 128
> maxAllocRegionsPerNode 32
> maxBackgroundDeletionThreads 16
> maxInodeDeallocPrefetch 128
> maxMBpS 16000
> maxReceiverThreads 128
> worker1Threads 1024
> worker3Threads 32
> [common]
> cipherList AUTHONLY
> socketMaxListenConnections 1500
> failureDetectionTime 60
> [common]
> adminMode central
> 
> File systems in cluster GSS.ebi.ac.uk:
> --------------------------------------
> /dev/gpfs1

> For more configuration paramenters i also attached a file with the 
> complete output of mmdiag --config.
> 
> 
> and mmlsfs:
> 
> File system attributes for /dev/gpfs1:
> ======================================
> flag                value                    description
> ------------------- ------------------------ 
> -----------------------------------
>  -f                 32768                    Minimum fragment size 
> in bytes (system pool)
>                     262144                   Minimum fragment size 
> in bytes (other pools)
>  -i                 512                      Inode size in bytes
>  -I                 32768                    Indirect block size in 
bytes
>  -m                 2                        Default number of 
> metadata replicas
>  -M                 2                        Maximum number of 
> metadata replicas
>  -r                 1                        Default number of data 
replicas
>  -R                 2                        Maximum number of data 
replicas
>  -j                 scatter                  Block allocation type
>  -D                 nfs4                     File locking semantics in 
effect
>  -k                 all                      ACL semantics in effect
>  -n                 1000                     Estimated number of 
> nodes that will mount file system
>  -B                 1048576                  Block size (system pool)
>                     8388608                  Block size (other pools)
>  -Q                 user;group;fileset       Quotas enforced
>                     user;group;fileset       Default quotas enabled
>  --filesetdf        no                       Fileset df enabled?
>  -V                 13.23 (3.5.0.7)          File system version
>  --create-time      Tue Mar 18 16:01:24 2014 File system creation time
>  -u                 yes                      Support for large LUNs?
>  -z                 no                       Is DMAPI enabled?
>  -L                 4194304                  Logfile size
>  -E                 yes                      Exact mtime mount option
>  -S                 yes                      Suppress atime mount option
>  -K                 whenpossible             Strict replica allocation 
option
>  --fastea           yes                      Fast external attributes 
enabled?
>  --inode-limit      134217728                Maximum number of inodes
>  -P                 system;data              Disk storage pools in file 
system
>  -d                 
> 
gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
>  -d                 
> 
gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
>  -d                 
> 
gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
>  -d                 
> 
gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
>  -d                 
> 
gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 
> Disks in file system
>  --perfileset-quota no                       Per-fileset quota 
enforcement
>  -A                 yes                      Automatic mount option
>  -o                 none                     Additional mount options
>  -T                 /gpfs1                   Default mount point
>  --mount-priority   0                        Mount priority
> 
> 
> Regards,
> Salvatore
> 

> On 14/10/14 17:22, Sven Oehme wrote:
> your GSS code version is very backlevel. 
> 
> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk 

> as well as mmlsconfig and mmlsfs all 
> 
> thx. Sven 
> 
> ------------------------------------------
> Sven Oehme 
> Scalable Storage Research 
> email: oehmes at us.ibm.com 
> Phone: +1 (408) 824-8904 
> IBM Almaden Research Lab 
> ------------------------------------------ 
> 
> 
> 
> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk> 
> To:        gpfsug-discuss at gpfsug.org 
> Date:        10/14/2014 08:23 AM 
> Subject:        Re: [gpfsug-discuss] wait for permission to append to 
log 
> Sent by:        gpfsug-discuss-bounces at gpfsug.org 
> 
> 
> 
> 
> On 14/10/14 15:51, Sven Oehme wrote: 
> it means there is contention on inserting data into the fast write 
> log on the GSS Node, which could be config or workload related 
> what GSS code version are you running 
> [root at ebi5-251 ~]# mmdiag --version
> 
> === mmdiag: version ===
> Current GPFS build: "3.5.0-11 efix1 (888041)".
> Built on Jul  9 2013 at 18:03:32
> Running 6 days 2 hours 10 minutes 35 secs 
> 
> 
> 
> and how are the nodes connected with each other (Ethernet or IB) ? 
> ethernet. they use the same bonding (4x10Gb/s) where the data is 
> passing. We don't have admin dedicated network 
> 
> [root at gss03a ~]# mmlscluster 
> 
> GPFS cluster information
> ========================
>   GPFS cluster name:         GSS.ebi.ac.uk
>   GPFS cluster id:           17987981184946329605
>   GPFS UID domain:           GSS.ebi.ac.uk
>   Remote shell command:      /usr/bin/ssh
>   Remote file copy command:  /usr/bin/scp
> 
> GPFS cluster configuration servers:
> -----------------------------------
>   Primary server:    gss01a.ebi.ac.uk
>   Secondary server:  gss02b.ebi.ac.uk
> 
>  Node  Daemon node name    IP address  Admin node name     Designation
> -----------------------------------------------------------------------
>    1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk    
quorum-manager
>    2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk    
quorum-manager
>    3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk    
quorum-manager
>    4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk    
quorum-manager
>    5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk    
quorum-manager
>    6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk    
quorum-manager
> 
> 
> Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different 
> subnet because of datacenter constraints ( They are not physically 
> in the same row, and due to network constraints was not possible to 
> put them in the same subnet). The packets are routed, but should not
> be a problem as there is 160Gb/s bandwidth between them.
> 
> Regards,
> Salvatore
> 
> 
> 
> ------------------------------------------
> Sven Oehme 
> Scalable Storage Research 
> email: oehmes at us.ibm.com 
> Phone: +1 (408) 824-8904 
> IBM Almaden Research Lab 
> ------------------------------------------ 
> 
> 
> 
> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk> 
> To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
> Date:        10/14/2014 07:40 AM 
> Subject:        [gpfsug-discuss] wait for permission to append to log 
> Sent by:        gpfsug-discuss-bounces at gpfsug.org 
> 
> 
> 
> hello all,
> could someone explain me the meaning of those waiters?
> 
> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> 
> Does it means that the vdisk logs are struggling?
> 
> Regards,
> Salvatore
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/
> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/7c32e015/attachment-0002.htm>

From zgiles at gmail.com  Tue Oct 14 18:32:50 2014
From: zgiles at gmail.com (Zachary Giles)
Date: Tue, 14 Oct 2014 13:32:50 -0400
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>
References: <543D35A7.7080800@ebi.ac.uk>
	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>
	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
	<543D51B6.3070602@ebi.ac.uk>
	<OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>
Message-ID: <CAMYZk=en0bOeLErefCQ0X_TybYh+9QUxiKszhnwFQvqz_BHOFQ@mail.gmail.com>

Except that AFAIK no one has published how to update GSS or where the
update code is.. All I've heard is "contact your sales rep".
Any pointers?

On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme <oehmes at us.ibm.com> wrote:
> you basically run GSS 1.0 code , while in the current version is GSS 2.0
> (which replaced Version 1.5 2 month ago)
>
> GSS 1.5 and 2.0 have several enhancements in this space so i strongly
> encourage you to upgrade your systems.
>
> if you can specify a bit what your workload is there might also be
> additional knobs we can turn to change the behavior.
>
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
> gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM:
>
>> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date: 10/14/2014 09:40 AM
>> Subject: Re: [gpfsug-discuss] wait for permission to append to log
>> Sent by: gpfsug-discuss-bounces at gpfsug.org
>>
>> Thanks in advance for your help.
>>
>> We have 6 RG:
>
>>  recovery group        vdisks     vdisks  servers
>>  ------------------  -----------  ------  -------
>>  gss01a                        4       8
>> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk
>>  gss01b                        4       8
>> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk
>>  gss02a                        4       8
>> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk
>>  gss02b                        4       8
>> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk
>>  gss03a                        4       8
>> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk
>>  gss03b                        4       8
>> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk
>>
>> Check the attached file for RG details.
>> Following mmlsconfig:
>
>> [root at gss01a ~]# mmlsconfig
>> Configuration data for cluster GSS.ebi.ac.uk:
>> ---------------------------------------------
>> myNodeConfigNumber 1
>> clusterName GSS.ebi.ac.uk
>> clusterId 17987981184946329605
>> autoload no
>> dmapiFileHandleSize 32
>> minReleaseLevel 3.5.0.11
>> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
>> pagepool 38g
>> nsdRAIDBufferPoolSizePct 80
>> maxBufferDescs 2m
>> numaMemoryInterleave yes
>> prefetchPct 5
>> maxblocksize 16m
>> nsdRAIDTracks 128k
>> ioHistorySize 64k
>> nsdRAIDSmallBufferSize 256k
>> nsdMaxWorkerThreads 3k
>> nsdMinWorkerThreads 3k
>> nsdRAIDSmallThreadRatio 2
>> nsdRAIDThreadsPerQueue 16
>> nsdClientCksumTypeLocal ck64
>> nsdClientCksumTypeRemote ck64
>> nsdRAIDEventLogToConsole all
>> nsdRAIDFastWriteFSDataLimit 64k
>> nsdRAIDFastWriteFSMetadataLimit 256k
>> nsdRAIDReconstructAggressiveness 1
>> nsdRAIDFlusherBuffersLowWatermarkPct 20
>> nsdRAIDFlusherBuffersLimitPct 80
>> nsdRAIDFlusherTracksLowWatermarkPct 20
>> nsdRAIDFlusherTracksLimitPct 80
>> nsdRAIDFlusherFWLogHighWatermarkMB 1000
>> nsdRAIDFlusherFWLogLimitMB 5000
>> nsdRAIDFlusherThreadsLowWatermark 1
>> nsdRAIDFlusherThreadsHighWatermark 512
>> nsdRAIDBlockDeviceMaxSectorsKB 4096
>> nsdRAIDBlockDeviceNrRequests 32
>> nsdRAIDBlockDeviceQueueDepth 16
>> nsdRAIDBlockDeviceScheduler deadline
>> nsdRAIDMaxTransientStale2FT 1
>> nsdRAIDMaxTransientStale3FT 1
>> syncWorkerThreads 256
>> tscWorkerPool 64
>> nsdInlineWriteMax 32k
>> maxFilesToCache 12k
>> maxStatCache 512
>> maxGeneralThreads 1280
>> flushedDataTarget 1024
>> flushedInodeTarget 1024
>> maxFileCleaners 1024
>> maxBufferCleaners 1024
>> logBufferCount 20
>> logWrapAmountPct 2
>> logWrapThreads 128
>> maxAllocRegionsPerNode 32
>> maxBackgroundDeletionThreads 16
>> maxInodeDeallocPrefetch 128
>> maxMBpS 16000
>> maxReceiverThreads 128
>> worker1Threads 1024
>> worker3Threads 32
>> [common]
>> cipherList AUTHONLY
>> socketMaxListenConnections 1500
>> failureDetectionTime 60
>> [common]
>> adminMode central
>>
>> File systems in cluster GSS.ebi.ac.uk:
>> --------------------------------------
>> /dev/gpfs1
>
>> For more configuration paramenters i also attached a file with the
>> complete output of mmdiag --config.
>>
>>
>> and mmlsfs:
>>
>> File system attributes for /dev/gpfs1:
>> ======================================
>> flag                value                    description
>> ------------------- ------------------------
>> -----------------------------------
>>  -f                 32768                    Minimum fragment size
>> in bytes (system pool)
>>                     262144                   Minimum fragment size
>> in bytes (other pools)
>>  -i                 512                      Inode size in bytes
>>  -I                 32768                    Indirect block size in bytes
>>  -m                 2                        Default number of
>> metadata replicas
>>  -M                 2                        Maximum number of
>> metadata replicas
>>  -r                 1                        Default number of data
>> replicas
>>  -R                 2                        Maximum number of data
>> replicas
>>  -j                 scatter                  Block allocation type
>>  -D                 nfs4                     File locking semantics in
>> effect
>>  -k                 all                      ACL semantics in effect
>>  -n                 1000                     Estimated number of
>> nodes that will mount file system
>>  -B                 1048576                  Block size (system pool)
>>                     8388608                  Block size (other pools)
>>  -Q                 user;group;fileset       Quotas enforced
>>                     user;group;fileset       Default quotas enabled
>>  --filesetdf        no                       Fileset df enabled?
>>  -V                 13.23 (3.5.0.7)          File system version
>>  --create-time      Tue Mar 18 16:01:24 2014 File system creation time
>>  -u                 yes                      Support for large LUNs?
>>  -z                 no                       Is DMAPI enabled?
>>  -L                 4194304                  Logfile size
>>  -E                 yes                      Exact mtime mount option
>>  -S                 yes                      Suppress atime mount option
>>  -K                 whenpossible             Strict replica allocation
>> option
>>  --fastea           yes                      Fast external attributes
>> enabled?
>>  --inode-limit      134217728                Maximum number of inodes
>>  -P                 system;data              Disk storage pools in file
>> system
>>  -d
>>
>> gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
>>  -d
>>
>> gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
>>  -d
>>
>> gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
>>  -d
>>
>> gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
>>  -d
>>
>> gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3
>> Disks in file system
>>  --perfileset-quota no                       Per-fileset quota enforcement
>>  -A                 yes                      Automatic mount option
>>  -o                 none                     Additional mount options
>>  -T                 /gpfs1                   Default mount point
>>  --mount-priority   0                        Mount priority
>>
>>
>> Regards,
>> Salvatore
>>
>
>> On 14/10/14 17:22, Sven Oehme wrote:
>> your GSS code version is very backlevel.
>>
>> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk
>> as well as mmlsconfig and mmlsfs all
>>
>> thx. Sven
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug-discuss at gpfsug.org
>> Date:        10/14/2014 08:23 AM
>> Subject:        Re: [gpfsug-discuss] wait for permission to append to log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>>
>> On 14/10/14 15:51, Sven Oehme wrote:
>> it means there is contention on inserting data into the fast write
>> log on the GSS Node, which could be config or workload related
>> what GSS code version are you running
>> [root at ebi5-251 ~]# mmdiag --version
>>
>> === mmdiag: version ===
>> Current GPFS build: "3.5.0-11 efix1 (888041)".
>> Built on Jul  9 2013 at 18:03:32
>> Running 6 days 2 hours 10 minutes 35 secs
>>
>>
>>
>> and how are the nodes connected with each other (Ethernet or IB) ?
>> ethernet. they use the same bonding (4x10Gb/s) where the data is
>> passing. We don't have admin dedicated network
>>
>> [root at gss03a ~]# mmlscluster
>>
>> GPFS cluster information
>> ========================
>>   GPFS cluster name:         GSS.ebi.ac.uk
>>   GPFS cluster id:           17987981184946329605
>>   GPFS UID domain:           GSS.ebi.ac.uk
>>   Remote shell command:      /usr/bin/ssh
>>   Remote file copy command:  /usr/bin/scp
>>
>> GPFS cluster configuration servers:
>> -----------------------------------
>>   Primary server:    gss01a.ebi.ac.uk
>>   Secondary server:  gss02b.ebi.ac.uk
>>
>>  Node  Daemon node name    IP address  Admin node name     Designation
>> -----------------------------------------------------------------------
>>    1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk    quorum-manager
>>    2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk    quorum-manager
>>    3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk    quorum-manager
>>    4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk    quorum-manager
>>    5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk    quorum-manager
>>    6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk    quorum-manager
>>
>>
>> Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different
>> subnet because of datacenter constraints ( They are not physically
>> in the same row, and due to network constraints was not possible to
>> put them in the same subnet). The packets are routed, but should not
>> be a problem as there is 160Gb/s bandwidth between them.
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date:        10/14/2014 07:40 AM
>> Subject:        [gpfsug-discuss] wait for permission to append to log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>> hello all,
>> could someone explain me the meaning of those waiters?
>>
>> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>>
>> Does it means that the vdisk logs are struggling?
>>
>> Regards,
>> Salvatore
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/
>> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM]
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
Zach Giles
zgiles at gmail.com


From oehmes at us.ibm.com  Tue Oct 14 18:38:10 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 10:38:10 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <CAMYZk=en0bOeLErefCQ0X_TybYh+9QUxiKszhnwFQvqz_BHOFQ@mail.gmail.com>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
	<543D51B6.3070602@ebi.ac.uk>	<OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>
	<CAMYZk=en0bOeLErefCQ0X_TybYh+9QUxiKszhnwFQvqz_BHOFQ@mail.gmail.com>
Message-ID: <OF139B4A1E.3C67F6BA-ON88257D71.00609314-88257D71.0060E13E@us.ibm.com>

i personally don't know, i am in GPFS Research, not in support :-)
but have you tried to contact your sales rep ? 
if you are not successful with that, shoot me a direct email with details 
about your company name, country and customer number and i try to get you 
somebody to help.

thx. Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Zachary Giles <zgiles at gmail.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/14/2014 10:33 AM
Subject:        Re: [gpfsug-discuss] wait for permission to append to log
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Except that AFAIK no one has published how to update GSS or where the
update code is.. All I've heard is "contact your sales rep".
Any pointers?

On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme <oehmes at us.ibm.com> wrote:
> you basically run GSS 1.0 code , while in the current version is GSS 2.0
> (which replaced Version 1.5 2 month ago)
>
> GSS 1.5 and 2.0 have several enhancements in this space so i strongly
> encourage you to upgrade your systems.
>
> if you can specify a bit what your workload is there might also be
> additional knobs we can turn to change the behavior.
>
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
> gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM:
>
>> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date: 10/14/2014 09:40 AM
>> Subject: Re: [gpfsug-discuss] wait for permission to append to log
>> Sent by: gpfsug-discuss-bounces at gpfsug.org
>>
>> Thanks in advance for your help.
>>
>> We have 6 RG:
>
>>  recovery group        vdisks     vdisks  servers
>>  ------------------  -----------  ------  -------
>>  gss01a                        4       8
>> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk
>>  gss01b                        4       8
>> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk
>>  gss02a                        4       8
>> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk
>>  gss02b                        4       8
>> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk
>>  gss03a                        4       8
>> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk
>>  gss03b                        4       8
>> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk
>>
>> Check the attached file for RG details.
>> Following mmlsconfig:
>
>> [root at gss01a ~]# mmlsconfig
>> Configuration data for cluster GSS.ebi.ac.uk:
>> ---------------------------------------------
>> myNodeConfigNumber 1
>> clusterName GSS.ebi.ac.uk
>> clusterId 17987981184946329605
>> autoload no
>> dmapiFileHandleSize 32
>> minReleaseLevel 3.5.0.11
>> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
>> pagepool 38g
>> nsdRAIDBufferPoolSizePct 80
>> maxBufferDescs 2m
>> numaMemoryInterleave yes
>> prefetchPct 5
>> maxblocksize 16m
>> nsdRAIDTracks 128k
>> ioHistorySize 64k
>> nsdRAIDSmallBufferSize 256k
>> nsdMaxWorkerThreads 3k
>> nsdMinWorkerThreads 3k
>> nsdRAIDSmallThreadRatio 2
>> nsdRAIDThreadsPerQueue 16
>> nsdClientCksumTypeLocal ck64
>> nsdClientCksumTypeRemote ck64
>> nsdRAIDEventLogToConsole all
>> nsdRAIDFastWriteFSDataLimit 64k
>> nsdRAIDFastWriteFSMetadataLimit 256k
>> nsdRAIDReconstructAggressiveness 1
>> nsdRAIDFlusherBuffersLowWatermarkPct 20
>> nsdRAIDFlusherBuffersLimitPct 80
>> nsdRAIDFlusherTracksLowWatermarkPct 20
>> nsdRAIDFlusherTracksLimitPct 80
>> nsdRAIDFlusherFWLogHighWatermarkMB 1000
>> nsdRAIDFlusherFWLogLimitMB 5000
>> nsdRAIDFlusherThreadsLowWatermark 1
>> nsdRAIDFlusherThreadsHighWatermark 512
>> nsdRAIDBlockDeviceMaxSectorsKB 4096
>> nsdRAIDBlockDeviceNrRequests 32
>> nsdRAIDBlockDeviceQueueDepth 16
>> nsdRAIDBlockDeviceScheduler deadline
>> nsdRAIDMaxTransientStale2FT 1
>> nsdRAIDMaxTransientStale3FT 1
>> syncWorkerThreads 256
>> tscWorkerPool 64
>> nsdInlineWriteMax 32k
>> maxFilesToCache 12k
>> maxStatCache 512
>> maxGeneralThreads 1280
>> flushedDataTarget 1024
>> flushedInodeTarget 1024
>> maxFileCleaners 1024
>> maxBufferCleaners 1024
>> logBufferCount 20
>> logWrapAmountPct 2
>> logWrapThreads 128
>> maxAllocRegionsPerNode 32
>> maxBackgroundDeletionThreads 16
>> maxInodeDeallocPrefetch 128
>> maxMBpS 16000
>> maxReceiverThreads 128
>> worker1Threads 1024
>> worker3Threads 32
>> [common]
>> cipherList AUTHONLY
>> socketMaxListenConnections 1500
>> failureDetectionTime 60
>> [common]
>> adminMode central
>>
>> File systems in cluster GSS.ebi.ac.uk:
>> --------------------------------------
>> /dev/gpfs1
>
>> For more configuration paramenters i also attached a file with the
>> complete output of mmdiag --config.
>>
>>
>> and mmlsfs:
>>
>> File system attributes for /dev/gpfs1:
>> ======================================
>> flag                value                    description
>> ------------------- ------------------------
>> -----------------------------------
>>  -f                 32768                    Minimum fragment size
>> in bytes (system pool)
>>                     262144                   Minimum fragment size
>> in bytes (other pools)
>>  -i                 512                      Inode size in bytes
>>  -I                 32768                    Indirect block size in 
bytes
>>  -m                 2                        Default number of
>> metadata replicas
>>  -M                 2                        Maximum number of
>> metadata replicas
>>  -r                 1                        Default number of data
>> replicas
>>  -R                 2                        Maximum number of data
>> replicas
>>  -j                 scatter                  Block allocation type
>>  -D                 nfs4                     File locking semantics in
>> effect
>>  -k                 all                      ACL semantics in effect
>>  -n                 1000                     Estimated number of
>> nodes that will mount file system
>>  -B                 1048576                  Block size (system pool)
>>                     8388608                  Block size (other pools)
>>  -Q                 user;group;fileset       Quotas enforced
>>                     user;group;fileset       Default quotas enabled
>>  --filesetdf        no                       Fileset df enabled?
>>  -V                 13.23 (3.5.0.7)          File system version
>>  --create-time      Tue Mar 18 16:01:24 2014 File system creation time
>>  -u                 yes                      Support for large LUNs?
>>  -z                 no                       Is DMAPI enabled?
>>  -L                 4194304                  Logfile size
>>  -E                 yes                      Exact mtime mount option
>>  -S                 yes                      Suppress atime mount 
option
>>  -K                 whenpossible             Strict replica allocation
>> option
>>  --fastea           yes                      Fast external attributes
>> enabled?
>>  --inode-limit      134217728                Maximum number of inodes
>>  -P                 system;data              Disk storage pools in file
>> system
>>  -d
>>
>> 
gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
>>  -d
>>
>> 
gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
>>  -d
>>
>> 
gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
>>  -d
>>
>> 
gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
>>  -d
>>
>> 
gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3
>> Disks in file system
>>  --perfileset-quota no                       Per-fileset quota 
enforcement
>>  -A                 yes                      Automatic mount option
>>  -o                 none                     Additional mount options
>>  -T                 /gpfs1                   Default mount point
>>  --mount-priority   0                        Mount priority
>>
>>
>> Regards,
>> Salvatore
>>
>
>> On 14/10/14 17:22, Sven Oehme wrote:
>> your GSS code version is very backlevel.
>>
>> can you please send me the output of mmlsrecoverygroup RGNAME -L 
--pdisk
>> as well as mmlsconfig and mmlsfs all
>>
>> thx. Sven
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug-discuss at gpfsug.org
>> Date:        10/14/2014 08:23 AM
>> Subject:        Re: [gpfsug-discuss] wait for permission to append to 
log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>>
>> On 14/10/14 15:51, Sven Oehme wrote:
>> it means there is contention on inserting data into the fast write
>> log on the GSS Node, which could be config or workload related
>> what GSS code version are you running
>> [root at ebi5-251 ~]# mmdiag --version
>>
>> === mmdiag: version ===
>> Current GPFS build: "3.5.0-11 efix1 (888041)".
>> Built on Jul  9 2013 at 18:03:32
>> Running 6 days 2 hours 10 minutes 35 secs
>>
>>
>>
>> and how are the nodes connected with each other (Ethernet or IB) ?
>> ethernet. they use the same bonding (4x10Gb/s) where the data is
>> passing. We don't have admin dedicated network
>>
>> [root at gss03a ~]# mmlscluster
>>
>> GPFS cluster information
>> ========================
>>   GPFS cluster name:         GSS.ebi.ac.uk
>>   GPFS cluster id:           17987981184946329605
>>   GPFS UID domain:           GSS.ebi.ac.uk
>>   Remote shell command:      /usr/bin/ssh
>>   Remote file copy command:  /usr/bin/scp
>>
>> GPFS cluster configuration servers:
>> -----------------------------------
>>   Primary server:    gss01a.ebi.ac.uk
>>   Secondary server:  gss02b.ebi.ac.uk
>>
>>  Node  Daemon node name    IP address  Admin node name     Designation
>> -----------------------------------------------------------------------
>>    1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk quorum-manager
>>    2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk quorum-manager
>>    3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk quorum-manager
>>    4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk quorum-manager
>>    5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk quorum-manager
>>    6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk quorum-manager
>>
>>
>> Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different
>> subnet because of datacenter constraints ( They are not physically
>> in the same row, and due to network constraints was not possible to
>> put them in the same subnet). The packets are routed, but should not
>> be a problem as there is 160Gb/s bandwidth between them.
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date:        10/14/2014 07:40 AM
>> Subject:        [gpfsug-discuss] wait for permission to append to log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>> hello all,
>> could someone explain me the meaning of those waiters?
>>
>> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>>
>> Does it means that the vdisk logs are struggling?
>>
>> Regards,
>> Salvatore
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/
>> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM]
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
Zach Giles
zgiles at gmail.com
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/369fc6e3/attachment-0002.htm>

From tmcneil at kingston.ac.uk  Wed Oct 15 14:01:49 2014
From: tmcneil at kingston.ac.uk (Mcneil, Tony)
Date: Wed, 15 Oct 2014 14:01:49 +0100
Subject: [gpfsug-discuss] Hello
Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705@KUMBX.kuds.kingston.ac.uk>

Hello All,

Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'

We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.

The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.

So far we have migrated all our students and approximately 60% of our staff.

Looking forward to receiving some interesting posts from the forum.

Regards
Tony.

Tony McNeil
Senior Systems Support Analyst,  Infrastructure,   Information Services
______________________________________________________________________________

T   Internal: 62852
T   020 8417 2852

Kingston University London
Penrhyn Road, Kingston upon Thames KT1 2EE
www.kingston.ac.uk<http://www.kingston.ac.uk>

Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed
to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
Please consider the environment before printing this email.


This email has been scanned for all viruses by the MessageLabs Email
Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141015/8f528bcf/attachment-0002.htm>

From Bill.Pappas at STJUDE.ORG  Thu Oct 16 14:49:57 2014
From: Bill.Pappas at STJUDE.ORG (Pappas, Bill)
Date: Thu, 16 Oct 2014 08:49:57 -0500
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>

Are you using ctdb?

Thanks,
Bill Pappas -
Manager - Enterprise Storage Group
Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
262 Danny Thomas Place, Mail Stop 504
Memphis, TN 38105
bill.pappas at stjude.org
(901) 595-4549 office
www.stjude.org
Email disclaimer: http://www.stjude.org/emaildisclaimer

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
Sent: Thursday, October 16, 2014 6:00 AM
To: gpfsug-discuss at gpfsug.org
Subject: gpfsug-discuss Digest, Vol 33, Issue 19

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at gpfsug.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at gpfsug.org

You can reach the person managing the list at
	gpfsug-discuss-owner at gpfsug.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Hello (Mcneil, Tony)


----------------------------------------------------------------------

Message: 1
Date: Wed, 15 Oct 2014 14:01:49 +0100
From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Hello
Message-ID:
	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
	
Content-Type: text/plain; charset="us-ascii"

Hello All,

Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'

We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.

The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.

So far we have migrated all our students and approximately 60% of our staff.

Looking forward to receiving some interesting posts from the forum.

Regards
Tony.

Tony McNeil
Senior Systems Support Analyst,  Infrastructure,   Information Services
______________________________________________________________________________

T   Internal: 62852
T   020 8417 2852

Kingston University London
Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>

Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
Please consider the environment before printing this email.


This email has been scanned for all viruses by the MessageLabs Email Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 33, Issue 19
**********************************************


From tmcneil at kingston.ac.uk  Fri Oct 17 06:25:00 2014
From: tmcneil at kingston.ac.uk (Mcneil, Tony)
Date: Fri, 17 Oct 2014 06:25:00 +0100
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
In-Reply-To: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>

Hi Bill,

Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel

Regards
Tony.

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
Sent: 16 October 2014 14:50
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)

Are you using ctdb?

Thanks,
Bill Pappas -
Manager - Enterprise Storage Group
Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
262 Danny Thomas Place, Mail Stop 504
Memphis, TN 38105
bill.pappas at stjude.org
(901) 595-4549 office
www.stjude.org
Email disclaimer: http://www.stjude.org/emaildisclaimer

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
Sent: Thursday, October 16, 2014 6:00 AM
To: gpfsug-discuss at gpfsug.org
Subject: gpfsug-discuss Digest, Vol 33, Issue 19

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at gpfsug.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at gpfsug.org

You can reach the person managing the list at
	gpfsug-discuss-owner at gpfsug.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Hello (Mcneil, Tony)


----------------------------------------------------------------------

Message: 1
Date: Wed, 15 Oct 2014 14:01:49 +0100
From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Hello
Message-ID:
	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
	
Content-Type: text/plain; charset="us-ascii"

Hello All,

Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'

We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.

The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.

So far we have migrated all our students and approximately 60% of our staff.

Looking forward to receiving some interesting posts from the forum.

Regards
Tony.

Tony McNeil
Senior Systems Support Analyst,  Infrastructure,   Information Services
______________________________________________________________________________

T   Internal: 62852
T   020 8417 2852

Kingston University London
Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>

Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
Please consider the environment before printing this email.


This email has been scanned for all viruses by the MessageLabs Email Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 33, Issue 19
**********************************************


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

This email has been scanned for all viruses by the MessageLabs Email
Security System.

This email has been scanned for all viruses by the MessageLabs Email
Security System.


From chair at gpfsug.org  Tue Oct 21 11:42:10 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Tue, 21 Oct 2014 11:42:10 +0100
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
In-Reply-To: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>
References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
	<41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>
Message-ID: <54463882.7070009@gpfsug.org>

I noticed that v7000 Unified is using CTDB v3.3.
What magic version is that as it's not in the git tree.  Latest tagged 
is 2.5.4.
Is that a question for Amitay?

On 17/10/14 06:25, Mcneil, Tony wrote:
> Hi Bill,
>
> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>
> Regards
> Tony.
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
> Sent: 16 October 2014 14:50
> To: gpfsug-discuss at gpfsug.org
> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>
> Are you using ctdb?
>
> Thanks,
> Bill Pappas -
> Manager - Enterprise Storage Group
> Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
> 262 Danny Thomas Place, Mail Stop 504
> Memphis, TN 38105
> bill.pappas at stjude.org
> (901) 595-4549 office
> www.stjude.org
> Email disclaimer: http://www.stjude.org/emaildisclaimer
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
> Sent: Thursday, October 16, 2014 6:00 AM
> To: gpfsug-discuss at gpfsug.org
> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>
> Send gpfsug-discuss mailing list submissions to
> 	gpfsug-discuss at gpfsug.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> or, via email, send a message with subject or body 'help' to
> 	gpfsug-discuss-request at gpfsug.org
>
> You can reach the person managing the list at
> 	gpfsug-discuss-owner at gpfsug.org
>
> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>     1. Hello (Mcneil, Tony)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 15 Oct 2014 14:01:49 +0100
> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
> Subject: [gpfsug-discuss] Hello
> Message-ID:
> 	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
> 	
> Content-Type: text/plain; charset="us-ascii"
>
> Hello All,
>
> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>
> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>
> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>
> So far we have migrated all our students and approximately 60% of our staff.
>
> Looking forward to receiving some interesting posts from the forum.
>
> Regards
> Tony.
>
> Tony McNeil
> Senior Systems Support Analyst,  Infrastructure,   Information Services
> ______________________________________________________________________________
>
> T   Internal: 62852
> T   020 8417 2852
>
> Kingston University London
> Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>
>
> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
> Please consider the environment before printing this email.
>
>
> This email has been scanned for all viruses by the MessageLabs Email Security System.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> End of gpfsug-discuss Digest, Vol 33, Issue 19
> **********************************************
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> This email has been scanned for all viruses by the MessageLabs Email
> Security System.
>
> This email has been scanned for all viruses by the MessageLabs Email
> Security System.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From rtriendl at ddn.com  Tue Oct 21 11:53:37 2014
From: rtriendl at ddn.com (Robert Triendl)
Date: Tue, 21 Oct 2014 10:53:37 +0000
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
In-Reply-To: <54463882.7070009@gpfsug.org>
References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
	<41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>
	<54463882.7070009@gpfsug.org>
Message-ID: <F7513C48-28BE-48FB-B44F-3CCF4F305DFE@ddn.com>

Yes, I think so? I am;-)

On 2014/10/21, at 19:42, Jez Tucker (Chair) <chair at gpfsug.org> wrote:

> I noticed that v7000 Unified is using CTDB v3.3.
> What magic version is that as it's not in the git tree.  Latest tagged is 2.5.4.
> Is that a question for Amitay?
> 
> On 17/10/14 06:25, Mcneil, Tony wrote:
>> Hi Bill,
>> 
>> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>> 
>> Regards
>> Tony.
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
>> Sent: 16 October 2014 14:50
>> To: gpfsug-discuss at gpfsug.org
>> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>> 
>> Are you using ctdb?
>> 
>> Thanks,
>> Bill Pappas -
>> Manager - Enterprise Storage Group
>> Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
>> 262 Danny Thomas Place, Mail Stop 504
>> Memphis, TN 38105
>> bill.pappas at stjude.org
>> (901) 595-4549 office
>> www.stjude.org
>> Email disclaimer: http://www.stjude.org/emaildisclaimer
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
>> Sent: Thursday, October 16, 2014 6:00 AM
>> To: gpfsug-discuss at gpfsug.org
>> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>> 
>> Send gpfsug-discuss mailing list submissions to
>> 	gpfsug-discuss at gpfsug.org
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> or, via email, send a message with subject or body 'help' to
>> 	gpfsug-discuss-request at gpfsug.org
>> 
>> You can reach the person managing the list at
>> 	gpfsug-discuss-owner at gpfsug.org
>> 
>> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>> 
>> 
>> Today's Topics:
>> 
>>    1. Hello (Mcneil, Tony)
>> 
>> 
>> ----------------------------------------------------------------------
>> 
>> Message: 1
>> Date: Wed, 15 Oct 2014 14:01:49 +0100
>> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
>> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
>> Subject: [gpfsug-discuss] Hello
>> Message-ID:
>> 	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
>> 	
>> Content-Type: text/plain; charset="us-ascii"
>> 
>> Hello All,
>> 
>> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>> 
>> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>> 
>> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>> 
>> So far we have migrated all our students and approximately 60% of our staff.
>> 
>> Looking forward to receiving some interesting posts from the forum.
>> 
>> Regards
>> Tony.
>> 
>> Tony McNeil
>> Senior Systems Support Analyst,  Infrastructure,   Information Services
>> ______________________________________________________________________________
>> 
>> T   Internal: 62852
>> T   020 8417 2852
>> 
>> Kingston University London
>> Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>
>> 
>> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
>> Please consider the environment before printing this email.
>> 
>> 
>> This email has been scanned for all viruses by the MessageLabs Email Security System.
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>
>> 
>> ------------------------------
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> End of gpfsug-discuss Digest, Vol 33, Issue 19
>> **********************************************
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> This email has been scanned for all viruses by the MessageLabs Email
>> Security System.
>> 
>> This email has been scanned for all viruses by the MessageLabs Email
>> Security System.
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Bill.Pappas at STJUDE.ORG  Tue Oct 21 16:59:08 2014
From: Bill.Pappas at STJUDE.ORG (Pappas, Bill)
Date: Tue, 21 Oct 2014 10:59:08 -0500
Subject: [gpfsug-discuss] Hello (Mcneil, Tony) (Jez Tucker (Chair))
Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE73664E3E8D@10.stjude.org>

>>Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.

1. What procedure did you follow to configure ctdb/samba to work?  Was it hard?  Could you show us, if permitted?
2. Are you also controlling NFS via ctdb?
3. Are you managing multiple IP devices?  Eg: ethX0 for VLAN104 and ethX1 for VLAN103 (<- for fast 10GbE users).

We use SoNAS and v7000 for most NAS and they use ctdb.  Their ctdb results are overall 'ok', with a few bumps here or there. Not too many ctdb PMRs over the 3-4 years on SoNAS.  
We want to set up ctdb for a GPFS AFM cache that services GPSF data clients.  That cache writes to an AFM home (SoNAS).  This cache also uses Samba and NFS for lightweight (as in IO, though still important) file access on this cache.
It does not use ctdb, but I know it should.
I would love to learn how you set your environment up even if it may be a little (or a lot) different.


Thanks,
Bill Pappas -
Manager - Enterprise Storage Group
Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
262 Danny Thomas Place, Mail Stop 504
Memphis, TN 38105
bill.pappas at stjude.org
(901) 595-4549 office
www.stjude.org
Email disclaimer: http://www.stjude.org/emaildisclaimer

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
Sent: Tuesday, October 21, 2014 6:00 AM
To: gpfsug-discuss at gpfsug.org
Subject: gpfsug-discuss Digest, Vol 33, Issue 21

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at gpfsug.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at gpfsug.org

You can reach the person managing the list at
	gpfsug-discuss-owner at gpfsug.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Hello (Mcneil, Tony) (Jez Tucker (Chair))
   2. Re: Hello (Mcneil, Tony) (Robert Triendl)


----------------------------------------------------------------------

Message: 1
Date: Tue, 21 Oct 2014 11:42:10 +0100
From: "Jez Tucker (Chair)" <chair at gpfsug.org>
To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony)
Message-ID: <54463882.7070009 at gpfsug.org>
Content-Type: text/plain; charset=windows-1252; format=flowed

I noticed that v7000 Unified is using CTDB v3.3.
What magic version is that as it's not in the git tree.  Latest tagged is 2.5.4.
Is that a question for Amitay?

On 17/10/14 06:25, Mcneil, Tony wrote:
> Hi Bill,
>
> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>
> Regards
> Tony.
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
> Sent: 16 October 2014 14:50
> To: gpfsug-discuss at gpfsug.org
> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>
> Are you using ctdb?
>
> Thanks,
> Bill Pappas -
> Manager - Enterprise Storage Group
> Sr. Enterprise Network Storage Architect Information Sciences 
> Department / Enterprise Informatics Division St. Jude Children's 
> Research Hospital
> 262 Danny Thomas Place, Mail Stop 504
> Memphis, TN 38105
> bill.pappas at stjude.org
> (901) 595-4549 office
> www.stjude.org
> Email disclaimer: http://www.stjude.org/emaildisclaimer
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of 
> gpfsug-discuss-request at gpfsug.org
> Sent: Thursday, October 16, 2014 6:00 AM
> To: gpfsug-discuss at gpfsug.org
> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>
> Send gpfsug-discuss mailing list submissions to
> 	gpfsug-discuss at gpfsug.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> or, via email, send a message with subject or body 'help' to
> 	gpfsug-discuss-request at gpfsug.org
>
> You can reach the person managing the list at
> 	gpfsug-discuss-owner at gpfsug.org
>
> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>     1. Hello (Mcneil, Tony)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 15 Oct 2014 14:01:49 +0100
> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
> Subject: [gpfsug-discuss] Hello
> Message-ID:
> 	
> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.u
> k>
> 	
> Content-Type: text/plain; charset="us-ascii"
>
> Hello All,
>
> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>
> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>
> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>
> So far we have migrated all our students and approximately 60% of our staff.
>
> Looking forward to receiving some interesting posts from the forum.
>
> Regards
> Tony.
>
> Tony McNeil
> Senior Systems Support Analyst,  Infrastructure,   Information Services
> ______________________________________________________________________
> ________
>
> T   Internal: 62852
> T   020 8417 2852
>
> Kingston University London
> Penrhyn Road, Kingston upon Thames KT1 2EE 
> www.kingston.ac.uk<http://www.kingston.ac.uk>
>
> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
> Please consider the environment before printing this email.
>
>
> This email has been scanned for all viruses by the MessageLabs Email Security System.
> -------------- next part -------------- An HTML attachment was 
> scrubbed...
> URL: 
> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528
> bcf/attachment-0001.html>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> End of gpfsug-discuss Digest, Vol 33, Issue 19
> **********************************************
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> This email has been scanned for all viruses by the MessageLabs Email 
> Security System.
>
> This email has been scanned for all viruses by the MessageLabs Email 
> Security System.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


------------------------------

Message: 2
Date: Tue, 21 Oct 2014 10:53:37 +0000
From: Robert Triendl <rtriendl at ddn.com>
To: "chair at gpfsug.org" <chair at gpfsug.org>, gpfsug main discussion list
	<gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony)
Message-ID: <F7513C48-28BE-48FB-B44F-3CCF4F305DFE at ddn.com>
Content-Type: text/plain; charset="Windows-1252"

Yes, I think so? I am;-)

On 2014/10/21, at 19:42, Jez Tucker (Chair) <chair at gpfsug.org> wrote:

> I noticed that v7000 Unified is using CTDB v3.3.
> What magic version is that as it's not in the git tree.  Latest tagged is 2.5.4.
> Is that a question for Amitay?
> 
> On 17/10/14 06:25, Mcneil, Tony wrote:
>> Hi Bill,
>> 
>> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>> 
>> Regards
>> Tony.
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org 
>> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
>> Sent: 16 October 2014 14:50
>> To: gpfsug-discuss at gpfsug.org
>> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>> 
>> Are you using ctdb?
>> 
>> Thanks,
>> Bill Pappas -
>> Manager - Enterprise Storage Group
>> Sr. Enterprise Network Storage Architect Information Sciences 
>> Department / Enterprise Informatics Division St. Jude Children's 
>> Research Hospital
>> 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 
>> bill.pappas at stjude.org
>> (901) 595-4549 office
>> www.stjude.org
>> Email disclaimer: http://www.stjude.org/emaildisclaimer
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org 
>> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of 
>> gpfsug-discuss-request at gpfsug.org
>> Sent: Thursday, October 16, 2014 6:00 AM
>> To: gpfsug-discuss at gpfsug.org
>> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>> 
>> Send gpfsug-discuss mailing list submissions to
>> 	gpfsug-discuss at gpfsug.org
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> or, via email, send a message with subject or body 'help' to
>> 	gpfsug-discuss-request at gpfsug.org
>> 
>> You can reach the person managing the list at
>> 	gpfsug-discuss-owner at gpfsug.org
>> 
>> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>> 
>> 
>> Today's Topics:
>> 
>>    1. Hello (Mcneil, Tony)
>> 
>> 
>> ---------------------------------------------------------------------
>> -
>> 
>> Message: 1
>> Date: Wed, 15 Oct 2014 14:01:49 +0100
>> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
>> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
>> Subject: [gpfsug-discuss] Hello
>> Message-ID:
>> 	
>> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.
>> uk>
>> 	
>> Content-Type: text/plain; charset="us-ascii"
>> 
>> Hello All,
>> 
>> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>> 
>> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>> 
>> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>> 
>> So far we have migrated all our students and approximately 60% of our staff.
>> 
>> Looking forward to receiving some interesting posts from the forum.
>> 
>> Regards
>> Tony.
>> 
>> Tony McNeil
>> Senior Systems Support Analyst,  Infrastructure,   Information Services
>> _____________________________________________________________________
>> _________
>> 
>> T   Internal: 62852
>> T   020 8417 2852
>> 
>> Kingston University London
>> Penrhyn Road, Kingston upon Thames KT1 2EE 
>> www.kingston.ac.uk<http://www.kingston.ac.uk>
>> 
>> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
>> Please consider the environment before printing this email.
>> 
>> 
>> This email has been scanned for all viruses by the MessageLabs Email Security System.
>> -------------- next part -------------- An HTML attachment was 
>> scrubbed...
>> URL: 
>> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f52
>> 8bcf/attachment-0001.html>
>> 
>> ------------------------------
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> End of gpfsug-discuss Digest, Vol 33, Issue 19
>> **********************************************
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> This email has been scanned for all viruses by the MessageLabs Email 
>> Security System.
>> 
>> This email has been scanned for all viruses by the MessageLabs Email 
>> Security System.
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 33, Issue 21
**********************************************


From bbanister at jumptrading.com  Thu Oct 23 19:35:45 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 23 Oct 2014 18:35:45 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>

I reviewed my RFE request again and notice that it has been marked as ?Private? and I think this is preventing people from voting on this RFE.  I have talked to others that would like to vote for this RFE.

How can I set the RFE to public so that others may vote on it?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister
Sent: Friday, October 10, 2014 12:13 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted.  I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable.  I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!)

Thanks,
-Bryan

Ps. Hajo said that he couldn?t access the RFE to vote on it:

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."
Cheers
Hajo

Here is what the RFE website states:
Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
A unique URL that you can bookmark and share with others.

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Friday, October 10, 2014 11:52 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
I agree with Ben, I think.

I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM

To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/2647953e/attachment-0002.htm>

From bbanister at jumptrading.com  Thu Oct 23 19:50:21 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 23 Oct 2014 18:50:21 +0000
Subject: [gpfsug-discuss] GPFS User Group at SC14
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C68@CHI-EXCHANGEW2.w2k.jumptrading.com>

I'm going to be attending the GPFS User Group at SC14 this year.  Here is basic agenda that was provided:

GPFS/Elastic Storage User Group<http://www.ibm.com/marketing/campaigns/responses/servlet/IRSL?v=4&l=2&r=1552126&m=19222&p=t4AF1985E3251806FBB7C1E35C6B50F33B3C55757912C1492D293663A23F7665E328C51C1A1FF8D073BBA436369B63338&e=2>
Monday, November 17, 2014


3:00 PM-5:00 PM: GPFS/Elastic Storage User Group
[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

IBM Software Defined Storage strategy update

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Customer presentations

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Future directions such as object storage and OpenStack integration

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Elastic Storage server update

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Elastic Storage roadmap (*NDA required)


5:00 PM: Reception

Conference room location provided upon registration. *Attendees must sign a non-disclosure agreement upon arrival or as provided in advance.


I think it would be great to review the submitted RFEs and give the user group the chance to vote on them to help promote the RFEs that we care about most.

I would also really appreciate any additional details regarding the new GPFS 4.1 deadlock detection facility and any recommended best practices around this new feature.

Thanks!
-Bryan

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/c6ed46c9/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 76 bytes
Desc: image001.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/c6ed46c9/attachment-0002.gif>

From chair at gpfsug.org  Thu Oct 23 19:52:07 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Thu, 23 Oct 2014 19:52:07 +0100
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>	<5437F562.1080609@psu.edu>	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <54494E57.90304@gpfsug.org>

Hi Bryan

   Unsure, to be honest.  When I added all the GPFS UG RFEs in, I didn't 
see an option to make the RFE private.
There's private fields, but not a 'make this RFE private' checkbox or such.

This one may be better directed to the GPFS developer forum / redo the RFE.

RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll 
be feeding info back to the group.

Jez

On 23/10/14 19:35, Bryan Banister wrote:
>
> I reviewed my RFE request again and notice that it has been marked as 
> ?Private? and I think this is preventing people from voting on this 
> RFE.  I have talked to others that would like to vote for this RFE.
>
> How can I set the RFE to public so that others may vote on it?
>
> Thanks!
>
> -Bryan
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Bryan Banister
> *Sent:* Friday, October 10, 2014 12:13 PM
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
> A DMAPI daemon solution puts a dependency on the DMAPI daemon for the 
> file system to be mounted.  I think it would be better to have 
> something like what I requested in the RFE that would hopefully not 
> have this dependency, and would be optional/configurable.  I?m sure we 
> would all prefer something that is supported directly by IBM (hence 
> the RFE!)
>
> Thanks,
>
> -Bryan
>
> Ps. Hajo said that he couldn?t access the RFE to vote on it:
>
> I would like to support the RFE but i get:
>
> "You cannot access this page because you do not have the proper 
> authority."
>
> Cheers
>
> Hajo
>
> Here is what the RFE website states:
>
> Bookmarkable 
> URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
> A unique URL that you can bookmark and share with others.
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org> 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Sven Oehme
> *Sent:* Friday, October 10, 2014 11:52 AM
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
> The only DMAPI agent i am aware of is a prototype that was written by 
> tridge in 2008 to demonstrate a file based HSM system for GPFS.
>
> its a working prototype, at least it worked in 2008 :-)
>
> you can get the source code from git :
>
> http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary
>
> just to be clear, there is no Support for this code. we obviously 
> Support the DMAPI interface , but the code that exposes the API is 
> nothing we provide Support for.
>
> thx. Sven
>
> On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister 
> <bbanister at jumptrading.com <mailto:bbanister at jumptrading.com>> wrote:
>
> I agree with Ben, I think.
>
> I don?t want to use the ILM policy engine as that puts a direct 
> workload against the metadata storage and server resources.  We need 
> something out-of-band, out of the file system operational path.
>
> Is there a simple DMAPI daemon that would log the file system 
> namespace changes that we could use?
>
> If so are there any limitations?
>
> And is it possible to set this up in an HA environment?
>
> Thanks!
>
> -Bryan
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org> 
> [mailto:gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org>] *On Behalf Of *Ben De Luca
> *Sent:* Friday, October 10, 2014 11:10 AM
>
>
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
> querying this through the policy engine is far to late to do any thing 
> useful with it
>
> On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com 
> <mailto:oehmes at gmail.com>> wrote:
>
> Ben,
>
> to get lists of 'Hot Files' turn File Heat on , some discussion about 
> it is here : 
> https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653
>
> thx.  Sven
>
> On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com 
> <mailto:bdeluca at gmail.com>> wrote:
>
> Id like this to see hot files
>
> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister 
> <bbanister at jumptrading.com <mailto:bbanister at jumptrading.com>> wrote:
>
> Hmm... I didn't think to use the DMAPI interface.  That could be a 
> nice option.  Has anybody done this already and are there any examples 
> we could look at?
>
> Thanks!
> -Bryan
>
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org> 
> [mailto:gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
> Sent: Friday, October 10, 2014 10:04 AM
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>
> On 10/9/14 3:31 PM, Bryan Banister wrote:
> >
> > Just wanted to pass my GPFS RFE along:
> >
> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> > 0458
> >
> >
> > *Description*:
> >
> > GPFS File System Manager should provide the option to log all file and
> > directory operations that occur in a file system, preferably stored in
> > a TSD (Time Series Database) that could be quickly queried through an
> > API interface and command line tools.  ...
> >
>
> The rudimentaries for this already exist via the DMAPI interface in 
> GPFS (used by the TSM HSM product). A while ago this was posted to the 
> IBM GPFS DeveloperWorks forum:
>
> On 1/3/11 10:27 AM, dWForums wrote:
> > Author:
> > AlokK.Dhir
> >
> > Message:
> > We have a proof of concept which uses DMAPI to listens to and 
> passively logs filesystem changes with a non blocking listener.  This 
> log can be used to generate backup sets etc. Unfortunately, a bug in 
> the current DMAPI keeps this approach from working in the case of 
> certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly 
> share the code once it is working.
>
> -Phil
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ------------------------------------------------------------------------
>
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ------------------------------------------------------------------------
>
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> ------------------------------------------------------------------------
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/c2f15d0b/attachment-0002.htm>

From bbanister at jumptrading.com  Thu Oct 23 19:59:52 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 23 Oct 2014 18:59:52 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <54494E57.90304@gpfsug.org>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<54494E57.90304@gpfsug.org>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C98@CHI-EXCHANGEW2.w2k.jumptrading.com>

Looks like IBM decides if the RFE is public or private:

Q: What are private requests?
A: Private requests are requests that can be viewed only by IBM, the request author, members of a group with the request in its watchlist, and users with the request in their watchlist. Only the author of the request can add a private request to their watchlist or a group watchlist. Private requests appear in various public views, such as Top 20 watched or Planned requests; however, only limited information about the request will be displayed.
IBM determines the default request visibility of a request, either public or private, and IBM may change the request visibility at any time. If you are watching a request and have subscribed to email notifications, you will be notified if the visibility of the request changes.

I'm submitting a request to make the RFE public so that others may vote on it now,
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jez Tucker (Chair)
Sent: Thursday, October 23, 2014 1:52 PM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

Hi Bryan

  Unsure, to be honest.  When I added all the GPFS UG RFEs in, I didn't see an option to make the RFE private.
There's private fields, but not a 'make this RFE private' checkbox or such.

This one may be better directed to the GPFS developer forum / redo the RFE.

RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll be feeding info back to the group.

Jez
On 23/10/14 19:35, Bryan Banister wrote:
I reviewed my RFE request again and notice that it has been marked as "Private" and I think this is preventing people from voting on this RFE.  I have talked to others that would like to vote for this RFE.

How can I set the RFE to public so that others may vote on it?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister
Sent: Friday, October 10, 2014 12:13 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted.  I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable.  I'm sure we would all prefer something that is supported directly by IBM (hence the RFE!)

Thanks,
-Bryan

Ps. Hajo said that he couldn't access the RFE to vote on it:

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."
Cheers
Hajo

Here is what the RFE website states:
Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
A unique URL that you can bookmark and share with others.

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Friday, October 10, 2014 11:52 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
I agree with Ben, I think.

I don't want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM

To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/e4dbfbd9/attachment-0002.htm>

From bbanister at jumptrading.com  Fri Oct 24 19:58:07 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 24 Oct 2014 18:58:07 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <OF63B850E2.11E81404-ON65257D6A.005322A1-65257D6A.00544690@in.ibm.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OF63B850E2.11E81404-ON65257D6A.005322A1-65257D6A.00544690@in.ibm.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB94C513@CHI-EXCHANGEW2.w2k.jumptrading.com>

It is with humble apology and great relief that I was wrong about the AFM limitation that I believed existed in the configuration I explained below.


The problem that I had with my configuration is that the NSD client cluster was not completely updated to GPFS 4.1.0-3, as there are a few nodes still running 3.5.0-20 in the cluster which currently prevents upgrading the GPFS file system release version (e.g. mmchconfig release=LATEST) to 4.1.0-3.  This GPFS configuration ?requirement? isn?t documented in the Advanced Admin Guide, but it makes sense that this is required since only the GPFS 4.1 release supports the GPFS protocol for AFM fileset targets.


I have tested the configuration with a new NSD Client cluster and the configuration works as desired.


Thanks Kalyan and others for their feedback.  Our file system namespace is unfortunately filled with small files that do not allow AFM to parallelize the data transfers across multiple nodes.  And unfortunately AFM will only allow one Gateway node per fileset to perform the prefetch namespace scan operation, which is incredibly slow as I stated before.  We were only seeing roughly 100 x " Queue numExec" operations per second.  I think this performance is gated by the directory namespace scan of the single gateway node.


Thanks!

-Bryan


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda
Sent: Tuesday, October 07, 2014 10:21 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations


some clarifications inline:


Regards

Kalyan

GPFS Development

EGL D Block, Bangalore


From:    Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>

To:          gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>

Date:     10/07/2014 08:12 PM

Subject:               Re: [gpfsug-discuss] AFM limitations in a multi-cluster

            environment, slow prefetch operations

Sent by:               gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>


Interesting that AFM is supposed to work in a multi-cluster environment.

We were using GPFS on the backend.  The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard

multi-cluster mount.   The "gateway" nodes in the new cluster mounted the

old file system.  All systems were connected over the same QDR IB fabric.

The client compute nodes in the third cluster mounted both the old and new file systems.  I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed.  I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable.  There were no error messages in any GPFS or system logs related to the `ls` fail.  NFS clients accessing cNFS servers in the new cluster also worked as expected.  The `ls` from the NFS client in an AFM fileset returned the expected directory listing.  Thus all symptoms indicated the configuration wasn't supported.  I may try to replicate the problem in a test environment at some point.


However AFM isn't really a great solution for file data migration between file systems for these reasons:

1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data).  No way to have it simply keep the two namespaces in sync.  And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system.  This is concerning and it is not easy to detect that this event has occurred.


--> The LU mode is meant for scenarios where changes in cache are not

--> meant

to be pushed back to old filesystem.  If thats not whats desired then other AFM modes like IW can be used to keep namespace in sync and data can flow from both sides.  Typically, for data migration --metadata-only to pull in the full namespace first and data can be migrated on demand or via policy as outlined above using prefetch cmd.  AFM setup should be extension to GPFS multi-cluster setup when using GPFS backend.


2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME!  And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES!

--> I am not sure I follow the first downtime.  If applications have to

start using the new filesystem, then they have to be informed accordingly.

If this can be done without bringing down applications, then there is no DOWNTIME.

Regarding, second downtime, you are right, disabling AFM after data migration requires unlink and hence downtime.  But there is a easy workaround, where revalidation intervals can be increased to max or GW nodes can be unconfigured without downtime with same effect.  And disabling AFM can be done at a later point during maintenance window.  We plan to modify this to have this done online aka without requiring unlink of the fileset.  This will get prioritized if there is enough interest in AFM being used in this direction.


3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration.  Multiple threads from a single node just doesn't cut it due to single node bandwidth limits.  When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second.  The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list!

--> Prefetch can run on multiple nodes by configuring multiple GW nodes

--> and

enabling parallel i/o as specified in the docs..link provided below.

Infact it can parallelize data xfer to a single file and also do multiple files in parallel depending on filesizes and various tuning params.


4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration.  Yes, there is of course efforts to breakup the namespace for each rsync operations.  This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired.


--> AFM can be used for data migration without any downtime dictated by

--> AFM

(see above) and it can infact use multiple threads on multiple nodes to do parallel i/o.


AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN.


Thanks,

-Bryan


-----Original Message-----

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda

Sent: Tuesday, October 07, 2014 12:03 AM

To: gpfsug main discussion list

Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations


Hi Bryan,

AFM supports GPFS multi-cluster..and we have customers already using this successfully.  Are you using GPFS backend?

Can you explain your configuration in detail and if ls is hung it would have generated some long waiters.  Maybe this should be pursued separately via PMR.  You can ping me the details directly if needed along with opening a PMR per IBM service process.


As for as prefetch is concerned, right now its limited to  one prefetch job per fileset.  Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration.

"afmNumFlushThreads" tunable controls the number of threads used by AFM.

This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.)


eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed.


List the change:

mmlsfileset fs1 prefetchIW --afm -L

Filesets in file system 'fs1':


Attributes for fileset prefetchIW:

===================================

Status                                  Linked

Path                                    /gpfs/fs1/prefetchIW

Id                                      36

afm-associated                          Yes

Target

nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch

Mode                                    independent-writer

File Lookup Refresh Interval            30 (default)

File Open Refresh Interval              30 (default)

Dir Lookup Refresh Interval             60 (default)

Dir Open Refresh Interval               60 (default)

Async Delay                             15 (default)

Last pSnapId                            0

Display Home Snapshots                  no

Number of Gateway Flush Threads         5

Prefetch Threshold                      0 (default)

Eviction Enabled                        yes (default)


AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm


and this link outlines tuning params for parallel i/o along with others:

http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning


Regards

Kalyan

GPFS Development

EGL D Block, Bangalore


From:   Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>

To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>

Date:   10/06/2014 09:57 PM

Subject:        Re: [gpfsug-discuss] AFM limitations in a multi-cluster

            environment, slow prefetch operations

Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>


We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan


From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme

Sent: Monday, October 06, 2014 11:28 AM

To: gpfsug main discussion list

Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations


Hi Bryan,


in 4.1 AFM uses multiple threads for reading data, this was different in

3.5 . what version are you using ?


thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>

wrote:

Just an FYI to the GPFS user community,


We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.


Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:

GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching:

v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset).


We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes.

However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.


Cheers,

-Bryan


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141024/ff1481a0/attachment-0002.htm>

From chair at gpfsug.org  Wed Oct 29 13:59:40 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Wed, 29 Oct 2014 13:59:40 +0000
Subject: [gpfsug-discuss] Storagebeers, Nov 13th
Message-ID: <5450F2CC.3070302@gpfsug.org>

Hello all,

   I just thought I'd make you all aware of a social, #storagebeers on 
Nov 13th organised by Martin Glassborow, one of our UG members.

http://www.gpfsug.org/2014/10/29/storagebeers-13th-nov/

I'll be popping along.  Hopefully see you there.

Jez


From Jared.Baker at uwyo.edu  Wed Oct 29 15:31:31 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 15:31:31 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
Message-ID: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/5e7d4cd0/attachment-0002.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 16:33:22 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 16:33:22 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <1414600402.24518.216.camel@buzzard.phy.strath.ac.uk>

On Wed, 2014-10-29 at 15:31 +0000, Jared David Baker wrote:

[SNIP]

> I?m wondering if somebody has seen this type of issue before? Will
> recreating my NSDs destroy the filesystem? I?m thinking that all the
> data is intact, but there is no crucial data on this file system yet,
> so I could recreate the file system, but I would like to learn how to
> solve a problem like this. Thanks for all help and information.
> 

At an educated guess and assuming the disks are visible to the OS (try
dd'ing the first few GB to /dev/null) it looks like you have managed at
some point to wipe the NSD descriptors from the disks - ouch.

The file system will continue to work after this has been done, but if
you start rebooting the NSD servers you will find after the last one has
been restarted the file system is unmountable. Simply unmounting the
file systems from each NDS server is also probably enough. For good
measure unless you have a backup of the NSD descriptors somewhere it is
also an unrecoverable condition.

Lucky for you if there is nothing on it that matters.

My suggestion is re-examine what you did during the firmware upgrade, as
that is the most likely culprit. However bear in mind that it could have
been days or even weeks ago that it occurred.

I would raise a PMR to be sure, but it looks to me like you will be
recreating the file system from scratch.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From oehmes at gmail.com  Wed Oct 29 16:42:26 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Wed, 29 Oct 2014 09:42:26 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>

Hello,

there are multiple reasons why the descriptors can not be found .

there was a recent change in firmware behaviors on multiple servers that
restore the GPT table from a disk if the disk was used as a OS disk before
used as GPFS disks.  some infos here :
https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e

if thats the case there is a procedure to restore them.

it could also be something very trivial , e.g. that your multipath mapping
changed and your nsddevice file actually just prints out devices instead of
scanning them and create a list on the fly , so GPFS ignores the new path
to the disks.
in any case , opening a PMR and work with Support is the best thing to do
before causing any more damage.
if the file-system is still mounted don't unmount it under any
circumstances as Support needs to extract NSD descriptor information from
it to restore them easily.

Sven


On Wed, Oct 29, 2014 at 8:31 AM, Jared David Baker <Jared.Baker at uwyo.edu>
wrote:

>  Hello all,
>
>
>
> I?m hoping that somebody can shed some light on a problem that I
> experienced yesterday. I?ve been working with GPFS for a couple months as
> an admin now, but I?ve come across a problem that I?m unable to see the
> answer to. Hopefully the solution is not listed somewhere blatantly on the
> web, but I spent a fair amount of time looking last night. Here is the
> situation: yesterday, I needed to update some firmware on a Mellanox HCA
> FDR14 card and reboot one of our GPFS servers and repeat for the sister
> node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However,
> upon reboot, the server seemed to lose the path mappings to the multipath
> devices for the NSDs. Output below:
>
>
>
> --
>
> [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch
>
>
>
> Disk name    NSD volume ID      Device         Node name
> Remarks
>
>
> ---------------------------------------------------------------------------------------
>
> dcs3800u31a_lun0 0A62001B54235577   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun0 0A62001B54235577   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini
>        (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -
> mminsd6.infini           (not found) server node
>
>
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> Also, the system was working fantastically before the reboot, but now I?m
> unable to mount the GPFS filesystem. The disk names look like they are
> there and mapped to the NSD volume ID, but there is no Device. I?ve created
> the /var/mmfs/etc/nsddevices script and it has the following output with
> user return 0:
>
>
>
> --
>
> [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
>
> mapper/dcs3800u31a_lun0 dmm
>
> mapper/dcs3800u31a_lun10 dmm
>
> mapper/dcs3800u31a_lun2 dmm
>
> mapper/dcs3800u31a_lun4 dmm
>
> mapper/dcs3800u31a_lun6 dmm
>
> mapper/dcs3800u31a_lun8 dmm
>
> mapper/dcs3800u31b_lun1 dmm
>
> mapper/dcs3800u31b_lun11 dmm
>
> mapper/dcs3800u31b_lun3 dmm
>
> mapper/dcs3800u31b_lun5 dmm
>
> mapper/dcs3800u31b_lun7 dmm
>
> mapper/dcs3800u31b_lun9 dmm
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> That output looks correct to me based on the documentation. So I went
> digging in the GPFS log file and found this relevant information:
>
>
>
> --
>
> Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails.
> No such NSD locally found.
>
> Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails.
> No such NSD locally found.
>
> --
>
>
>
> Okay, so the NSDs don?t seem to be able to be found, so I attempt to
> rediscover the NSD by executing the command mmnsddiscover:
>
>
>
> --
>
> [root at mmmnsd5 ~]# mmnsddiscover
>
> mmnsddiscover:  Attempting to rediscover the disks.  This may take a while
> ...
>
> mmnsddiscover:  Finished.
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> I was hoping that finished, but then upon restarting GPFS, there was no
> success. Verifying with mmlsnsd -X -f gscratch
>
>
>
> --
>
> [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch
>
>
>
> Disk name    NSD volume ID      Device         Devtype  Node
> name                Remarks
>
>
> ---------------------------------------------------------------------------------------------------
>
> dcs3800u31a_lun0 0A62001B54235577   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun0 0A62001B54235577   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -              -
>    mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -              -
> mminsd6.infini           (not found) server node
>
>
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> I?m wondering if somebody has seen this type of issue before? Will
> recreating my NSDs destroy the filesystem? I?m thinking that all the data
> is intact, but there is no crucial data on this file system yet, so I could
> recreate the file system, but I would like to learn how to solve a problem
> like this. Thanks for all help and information.
>
>
>
> Regards,
>
>
>
> Jared
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/be381140/attachment-0002.htm>

From oester at gmail.com  Wed Oct 29 16:46:35 2014
From: oester at gmail.com (Bob Oesterlin)
Date: Wed, 29 Oct 2014 11:46:35 -0500
Subject: [gpfsug-discuss] GPFS 4.1 event "deadlockOverload"
Message-ID: <CAMNdFvA9XpNVGGM9=BefO9x6rjkm9tGoD6jeG759H1Whd=4f9w@mail.gmail.com>

I posted this to developerworks, but haven't seen a response. This is NOT
the same event "deadlockDetected" that is documented in the 4.1 Probelm
Determination Guide.

I see these errors -in my mmfslog on the cluster master. I just upgraded to
4.1, and I can't find this documented anywhere. What is "event
deadlockOverload" ? And what script would it call?


The nodes in question are part of a CNFS group.


Mon Oct 27 10:11:08.848 2014: [I] Received overload notification request
from 10.30.42.30 to forward to all nodes in cluster XXX
Mon Oct 27 10:11:08.849 2014: [I] Calling User Exit Script
gpfsNotifyOverload: event deadlockOverload, Async command
/usr/lpp/mmfs/bin/mmcommon.
Mon Oct 27 10:11:14.478 2014: [I] Received overload notification request
from 10.30.42.26 to forward to all nodes in cluster XXX
Mon Oct 27 10:11:58.869 2014: [I] Received overload notification request
from 10.30.42.30 to forward to all nodes in cluster XXX
Mon Oct 27 10:11:58.870 2014: [I] Calling User Exit Script
gpfsNotifyOverload: event deadlockOverload, Async command
/usr/lpp/mmfs/bin/mmcommon.

Bob Oesterlin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/40209cd2/attachment-0002.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 17:19:14 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 17:19:14 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
Message-ID: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>

On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> Hello,
> 
> 
> there are multiple reasons why the descriptors can not be found .
> 
> 
> there was a recent change in firmware behaviors on multiple servers
> that restore the GPT table from a disk if the disk was used as a OS
> disk before used as GPFS disks.  some infos
> here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e
> 
> 
> if thats the case there is a procedure to restore them.

I have been categorically told by IBM in no uncertain terms if the NSD
descriptors have *ALL* been wiped then it is game over for that file
system; restore from backup is your only option.

If the GPT table has been "restored" and overwritten the NSD descriptors
then you are hosed.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From oehmes at gmail.com  Wed Oct 29 17:22:30 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Wed, 29 Oct 2014 10:22:30 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
Message-ID: <CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>

if you still have a running system you can extract the information and
recreate the descriptors.
if your sytem is already down, this is not possible any more.

which is why i suggested to open a PMR as the Support team will be able to
provide the right guidance and help .

Sven

On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard <jonathan at buzzard.me.uk>
wrote:

> On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> > Hello,
> >
> >
> > there are multiple reasons why the descriptors can not be found .
> >
> >
> > there was a recent change in firmware behaviors on multiple servers
> > that restore the GPT table from a disk if the disk was used as a OS
> > disk before used as GPFS disks.  some infos
> > here :
> https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e
> >
> >
> > if thats the case there is a procedure to restore them.
>
> I have been categorically told by IBM in no uncertain terms if the NSD
> descriptors have *ALL* been wiped then it is game over for that file
> system; restore from backup is your only option.
>
> If the GPT table has been "restored" and overwritten the NSD descriptors
> then you are hosed.
>
> JAB.
>
> --
> Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
> Fife, United Kingdom.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/98e54436/attachment-0002.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 17:29:09 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 17:29:09 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
Message-ID: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk>

On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote:
> if you still have a running system you can extract the information and
> recreate the descriptors. 

We had a running system with the file system still mounted on some nodes
but all the NSD descriptors wiped, and I repeat where categorically told
by IBM that nothing could be done and to restore the file system from
backup.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From Jared.Baker at uwyo.edu  Wed Oct 29 17:30:00 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 17:30:00 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
Message-ID: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com>

Thanks for all the information. I?m not exactly sure what happened during the firmware update of the HCAs (another admin). But I do have all the stanza files that I used to create the NSDs. Possible to utilize them to just regenerate the NSDs or is it consensus that the FS is gone? As the system was not in production (yet) I?ve got no problem delaying the release and running some tests to verify possible fixes. The system was already unmounted, so it is a completely inactive FS across the cluster.

Thanks,

Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 11:23 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

if you still have a running system you can extract the information and recreate the descriptors.
if your sytem is already down, this is not possible any more.

which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help .

Sven

On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard <jonathan at buzzard.me.uk<mailto:jonathan at buzzard.me.uk>> wrote:
On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> Hello,
>
>
> there are multiple reasons why the descriptors can not be found .
>
>
> there was a recent change in firmware behaviors on multiple servers
> that restore the GPT table from a disk if the disk was used as a OS
> disk before used as GPFS disks.  some infos
> here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e
>
>
> if thats the case there is a procedure to restore them.

I have been categorically told by IBM in no uncertain terms if the NSD
descriptors have *ALL* been wiped then it is game over for that file
system; restore from backup is your only option.

If the GPT table has been "restored" and overwritten the NSD descriptors
then you are hosed.

JAB.

--
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk<http://buzzard.me.uk>
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/75266e11/attachment-0002.htm>

From oehmes at us.ibm.com  Wed Oct 29 17:45:38 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 10:45:38 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
	<4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF4F2A0B39.CDE2633C-ON88257D80.0060D847-88257D80.00618FB5@us.ibm.com>

Jared,

if time permits i would open a PMR to check what happened. as i stated in 
my first email it could be multiple things, the GPT restore is only one 
possible of many explanations and some more simple reasons could explain 
what you see as well. get somebody from support check the state and then 
we know for sure. it would give you also peace of mind that it doesn't 
happen again when you are in production.
if you feel its not worth and you don't wipe any important information 
start over again.

btw. the newer BIOS versions of IBM servers have a option from preventing 
the GPT issue from happening : 

[root at gss02n1 ~]# asu64 showvalues DiskGPTRecovery.DiskGPTRecovery
IBM Advanced Settings Utility version 9.61.85B
Licensed Materials - Property of IBM
(C) Copyright IBM Corp. 2007-2014 All Rights Reserved
IMM LAN-over-USB device 0 enabled successfully.
Successfully discovered the IMM via SLP.
Discovered IMM at IP address 169.254.95.118
Connected to IMM at IP address 169.254.95.118
DiskGPTRecovery.DiskGPTRecovery=None=<Automatic>

if you set it the GPT will never get restored. you would have to set this 
on all the nodes that have access to the disks.

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 10:30 AM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Thanks for all the information. I?m not exactly sure what happened during 
the firmware update of the HCAs (another admin). But I do have all the 
stanza files that I used to create the NSDs. Possible to utilize them to 
just regenerate the NSDs or is it consensus that the FS is gone? As the 
system was not in production (yet) I?ve got no problem delaying the 
release and running some tests to verify possible fixes. The system was 
already unmounted, so it is a completely inactive FS across the cluster.
 
Thanks,
 
Jared
 
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 11:23 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings
 
if you still have a running system you can extract the information and 
recreate the descriptors. 
if your sytem is already down, this is not possible any more. 
 
which is why i suggested to open a PMR as the Support team will be able to 
provide the right guidance and help . 
 
Sven
 
On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard <jonathan at buzzard.me.uk
> wrote:
On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> Hello,
>
>
> there are multiple reasons why the descriptors can not be found .
>
>
> there was a recent change in firmware behaviors on multiple servers
> that restore the GPT table from a disk if the disk was used as a OS
> disk before used as GPFS disks.  some infos
> here : 
https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e

>
>
> if thats the case there is a procedure to restore them.

I have been categorically told by IBM in no uncertain terms if the NSD
descriptors have *ALL* been wiped then it is game over for that file
system; restore from backup is your only option.

If the GPT table has been "restored" and overwritten the NSD descriptors
then you are hosed.

JAB.

--
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/3f10207c/attachment-0002.htm>

From ewahl at osc.edu  Wed Oct 29 18:57:28 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Wed, 29 Oct 2014 18:57:28 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>,
	<1414603749.24518.227.camel@buzzard.phy.strath.ac.uk>
Message-ID: <C59E5201836F7147BAD35189FFBB35D10116515C18@USOAPP09V04P.si.lan>

SOBAR is your friend at that point?

Ed Wahl
OSC

 
________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jonathan Buzzard [jonathan at buzzard.me.uk]
Sent: Wednesday, October 29, 2014 1:29 PM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote:
> if you still have a running system you can extract the information and
> recreate the descriptors.

We had a running system with the file system still mounted on some nodes
but all the NSD descriptors wiped, and I repeat where categorically told
by IBM that nothing could be done and to restore the file system from
backup.

JAB.

--
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From ewahl at osc.edu  Wed Oct 29 19:07:34 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Wed, 29 Oct 2014 19:07:34 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>

  Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.  
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I?m hoping that somebody can shed some light on a problem that I experienced yesterday. I?ve been working with GPFS for a couple months as an admin now, but I?ve come across a problem that I?m unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I?m unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I?ve created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don?t seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I?m wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I?m thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared


From Jared.Baker at uwyo.edu  Wed Oct 29 19:27:26 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 19:27:26 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
Message-ID: <e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>

Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

   8       48 31251951616 sdd
   8       32 31251951616 sdc
   8       80 31251951616 sdf
   8       16 31251951616 sdb
   8      128 31251951616 sdi
   8      112 31251951616 sdh
   8       96 31251951616 sdg
   8      192 31251951616 sdm
   8      240 31251951616 sdp
   8      208 31251951616 sdn
   8      144 31251951616 sdj
   8       64 31251951616 sde
   8      224 31251951616 sdo
   8      160 31251951616 sdk
   8      176 31251951616 sdl
  65        0 31251951616 sdq
  65       48 31251951616 sdt
  65       16 31251951616 sdr
  65      128  584960000 sdy
  65       80 31251951616 sdv
  65       96 31251951616 sdw
  65       64 31251951616 sdu
  65      112 31251951616 sdx
  65       32 31251951616 sds
   8        0 31251951616 sda
 253        0 31251951616 dm-0
 253        1 31251951616 dm-1
 253        2 31251951616 dm-2
 253        3 31251951616 dm-3
 253        4 31251951616 dm-4
 253        5 31251951616 dm-5
 253        6 31251951616 dm-6
 253        7 31251951616 dm-7
 253        8 31251951616 dm-8
 253        9 31251951616 dm-9
 253       10 31251951616 dm-10
 253       11 31251951616 dm-11
 253       12  584960000 dm-12
 253       13     524288 dm-13
 253       14   16777216 dm-14
 253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

  Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.  
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oehmes at us.ibm.com  Wed Oct 29 19:41:22 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 12:41:22 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>

can you please post the content of your nsddevices script ? 

also please run 

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is 
all SAS attached. Two servers which can see the multipath LUNS for 
failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

   8       48 31251951616 sdd
   8       32 31251951616 sdc
   8       80 31251951616 sdf
   8       16 31251951616 sdb
   8      128 31251951616 sdi
   8      112 31251951616 sdh
   8       96 31251951616 sdg
   8      192 31251951616 sdm
   8      240 31251951616 sdp
   8      208 31251951616 sdn
   8      144 31251951616 sdj
   8       64 31251951616 sde
   8      224 31251951616 sdo
   8      160 31251951616 sdk
   8      176 31251951616 sdl
  65        0 31251951616 sdq
  65       48 31251951616 sdt
  65       16 31251951616 sdr
  65      128  584960000 sdy
  65       80 31251951616 sdv
  65       96 31251951616 sdw
  65       64 31251951616 sdu
  65      112 31251951616 sdx
  65       32 31251951616 sds
   8        0 31251951616 sda
 253        0 31251951616 dm-0
 253        1 31251951616 dm-1
 253        2 31251951616 dm-2
 253        3 31251951616 dm-3
 253        4 31251951616 dm-4
 253        5 31251951616 dm-5
 253        6 31251951616 dm-6
 253        7 31251951616 dm-7
 253        8 31251951616 dm-8
 253        9 31251951616 dm-9
 253       10 31251951616 dm-10
 253       11 31251951616 dm-11
 253       12  584960000 dm-12
 253       13     524288 dm-13
 253       14   16777216 dm-14
 253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

  Can you see the block devices from inside the OS after the reboot?  I 
don't see where you mention this.  How is the storage attached to the 
server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes 
share the storage?  All nsds in same failure group?     I was quickly 
brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly 
updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one 
day, dm-something else another), so that is no problem.  But wacking the 
volume label is a pain. 
When hardware dies if you have nsds sharing the same LUNs you can just 
transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org 
[gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker 
[Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I 
experienced yesterday. I've been working with GPFS for a couple months as 
an admin now, but I've come across a problem that I'm unable to see the 
answer to. Hopefully the solution is not listed somewhere blatantly on the 
web, but I spent a fair amount of time looking last night. Here is the 
situation: yesterday, I needed to update some firmware on a Mellanox HCA 
FDR14 card and reboot one of our GPFS servers and repeat for the sister 
node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, 
upon reboot, the server seemed to lose the path mappings to the multipath 
devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini   (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini   (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini   (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini   (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini  (not 
found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm 
unable to mount the GPFS filesystem. The disk names look like they are 
there and mapped to the NSD volume ID, but there is no Device. I've 
created the /var/mmfs/etc/nsddevices script and it has the following 
output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went 
digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. 
No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. 
No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to 
rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while 
...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no 
success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name  Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd5.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd5.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini 
          (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will 
recreating my NSDs destroy the filesystem? I'm thinking that all the data 
is intact, but there is no crucial data on this file system yet, so I 
could recreate the file system, but I would like to learn how to solve a 
problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/8b616a16/attachment-0002.htm>

From Jared.Baker at uwyo.edu  Wed Oct 29 19:46:23 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 19:46:23 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
Message-ID: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>

Sven, output below:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--

Thanks, Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

can you please post the content of your nsddevices script ?

also please run

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
 `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

  8       48 31251951616 sdd
  8       32 31251951616 sdc
  8       80 31251951616 sdf
  8       16 31251951616 sdb
  8      128 31251951616 sdi
  8      112 31251951616 sdh
  8       96 31251951616 sdg
  8      192 31251951616 sdm
  8      240 31251951616 sdp
  8      208 31251951616 sdn
  8      144 31251951616 sdj
  8       64 31251951616 sde
  8      224 31251951616 sdo
  8      160 31251951616 sdk
  8      176 31251951616 sdl
 65        0 31251951616 sdq
 65       48 31251951616 sdt
 65       16 31251951616 sdr
 65      128  584960000 sdy
 65       80 31251951616 sdv
 65       96 31251951616 sdw
 65       64 31251951616 sdu
 65      112 31251951616 sdx
 65       32 31251951616 sds
  8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

 Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/636898cf/attachment-0002.htm>

From oehmes at us.ibm.com  Wed Oct 29 20:02:53 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 13:02:53 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>

Hi,

i was asking for the content, not the result :-)

can you run cat /var/mmfs/etc/nsddevices

the 2nd output confirms at least that there is no correct label on the 
disk, as it returns EFI 

on a GNR system you get a few more infos , but at least you should see the 
NSD descriptor string like i get on my system : 


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings
T7$V
e2d2s08
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s

while i still would like to see the nsddevices script i assume your NSD 
descriptor is wiped and without a lot of manual labor and at least a 
recent GPFS dump this is very hard if at all to recreate.


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 12:46 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Sven, output below:
 
--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--
 
Thanks, Jared
 
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings
 
can you please post the content of your nsddevices script ? 

also please run   

dd if=/dev/dm-0 bs=1k count=32 |strings 

and post the output 

thx. Sven 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Jared David Baker <Jared.Baker at uwyo.edu> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/29/2014 12:27 PM 
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is 
all SAS attached. Two servers which can see the multipath LUNS for 
failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
 `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

  8       48 31251951616 sdd
  8       32 31251951616 sdc
  8       80 31251951616 sdf
  8       16 31251951616 sdb
  8      128 31251951616 sdi
  8      112 31251951616 sdh
  8       96 31251951616 sdg
  8      192 31251951616 sdm
  8      240 31251951616 sdp
  8      208 31251951616 sdn
  8      144 31251951616 sdj
  8       64 31251951616 sde
  8      224 31251951616 sdo
  8      160 31251951616 sdk
  8      176 31251951616 sdl
 65        0 31251951616 sdq
 65       48 31251951616 sdt
 65       16 31251951616 sdr
 65      128  584960000 sdy
 65       80 31251951616 sdv
 65       96 31251951616 sdw
 65       64 31251951616 sdu
 65      112 31251951616 sdx
 65       32 31251951616 sds
  8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

 Can you see the block devices from inside the OS after the reboot?  I 
don't see where you mention this.  How is the storage attached to the 
server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes 
share the storage?  All nsds in same failure group?     I was quickly 
brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly 
updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one 
day, dm-something else another), so that is no problem.  But wacking the 
volume label is a pain. 
When hardware dies if you have nsds sharing the same LUNs you can just 
transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org 
[gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker 
[Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I 
experienced yesterday. I've been working with GPFS for a couple months as 
an admin now, but I've come across a problem that I'm unable to see the 
answer to. Hopefully the solution is not listed somewhere blatantly on the 
web, but I spent a fair amount of time looking last night. Here is the 
situation: yesterday, I needed to update some firmware on a Mellanox HCA 
FDR14 card and reboot one of our GPFS servers and repeat for the sister 
node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, 
upon reboot, the server seemed to lose the path mappings to the multipath 
devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini   (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini   (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini   (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini   (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini  (not 
found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm 
unable to mount the GPFS filesystem. The disk names look like they are 
there and mapped to the NSD volume ID, but there is no Device. I've 
created the /var/mmfs/etc/nsddevices script and it has the following 
output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went 
digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. 
No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. 
No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to 
rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while 
...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no 
success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name  Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd5.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd5.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini 
          (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will 
recreating my NSDs destroy the filesystem? I'm thinking that all the data 
is intact, but there is no crucial data on this file system yet, so I 
could recreate the file system, but I would like to learn how to solve a 
problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/52ddf40d/attachment-0002.htm>

From Jared.Baker at uwyo.edu  Wed Oct 29 20:13:06 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 20:13:06 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
Message-ID: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>

Apologies Sven, w/o comments below:

--
#!/bin/ksh
CONTROLLER_REGEX='[ab]_lun[0-9]+'
for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX )
do
   echo mapper/$dev dmm
   #echo mapper/$dev generic
done

# Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover),
return 0
--

Best, Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:03 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Hi,

i was asking for the content, not the result :-)

can you run cat /var/mmfs/etc/nsddevices

the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI

on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system :


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings
T7$V
e2d2s08
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s

while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate.


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:46 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Sven, output below:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--

Thanks, Jared

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

can you please post the content of your nsddevices script ?

also please run

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

 8       48 31251951616 sdd
 8       32 31251951616 sdc
 8       80 31251951616 sdf
 8       16 31251951616 sdb
 8      128 31251951616 sdi
 8      112 31251951616 sdh
 8       96 31251951616 sdg
 8      192 31251951616 sdm
 8      240 31251951616 sdp
 8      208 31251951616 sdn
 8      144 31251951616 sdj
 8       64 31251951616 sde
 8      224 31251951616 sdo
 8      160 31251951616 sdk
 8      176 31251951616 sdl
65        0 31251951616 sdq
65       48 31251951616 sdt
65       16 31251951616 sdr
65      128  584960000 sdy
65       80 31251951616 sdv
65       96 31251951616 sdw
65       64 31251951616 sdu
65      112 31251951616 sdx
65       32 31251951616 sds
 8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/fc35facb/attachment-0002.htm>

From oehmes at us.ibm.com  Wed Oct 29 20:25:10 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 13:25:10 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>

Hi,

based on what i see is your BIOS or FW update wiped the NSD descriptor by 
restoring a GPT table on the start of a disk that shouldn't have a GPT 
table to begin with as its under control of GPFS.
future releases of GPFS prevent this by writing our own GPT label to the 
disks so other tools don't touch them, but that doesn't help in your case 
any more. if you want this officially confirmed i would still open a PMR, 
but at that point given that you don't seem to have any production data on 
it from what i see in your response you should recreate the filesystem.

Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 01:13 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Apologies Sven, w/o comments below:
 
--
#!/bin/ksh
CONTROLLER_REGEX='[ab]_lun[0-9]+'
for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX )
do
   echo mapper/$dev dmm
   #echo mapper/$dev generic
done
 
# Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover),
return 0
--
 
Best, Jared
 
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:03 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings
 
Hi, 

i was asking for the content, not the result :-) 

can you run cat /var/mmfs/etc/nsddevices 

the 2nd output confirms at least that there is no correct label on the 
disk, as it returns EFI 

on a GNR system you get a few more infos , but at least you should see the 
NSD descriptor string like i get on my system : 


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings 
T7$V 
e2d2s08 
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014 
32+0 records in 
32+0 records out 
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s 

while i still would like to see the nsddevices script i assume your NSD 
descriptor is wiped and without a lot of manual labor and at least a 
recent GPFS dump this is very hard if at all to recreate. 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Jared David Baker <Jared.Baker at uwyo.edu> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/29/2014 12:46 PM 
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


Sven, output below: 
  
-- 
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices 
mapper/dcs3800u31a_lun0 dmm 
mapper/dcs3800u31a_lun10 dmm 
mapper/dcs3800u31a_lun2 dmm 
mapper/dcs3800u31a_lun4 dmm 
mapper/dcs3800u31a_lun6 dmm 
mapper/dcs3800u31a_lun8 dmm 
mapper/dcs3800u31b_lun1 dmm 
mapper/dcs3800u31b_lun11 dmm 
mapper/dcs3800u31b_lun3 dmm 
mapper/dcs3800u31b_lun5 dmm 
mapper/dcs3800u31b_lun7 dmm 
mapper/dcs3800u31b_lun9 dmm 
[root at mmmnsd5 ~]# 
-- 
-- 
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 
32+0 records in 
32+0 records out 
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s 
EFI PART 
system 
[root at mmmnsd5 /]# 
-- 
  
Thanks, Jared 
  
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings 
  
can you please post the content of your nsddevices script ? 

also please run   

dd if=/dev/dm-0 bs=1k count=32 |strings 

and post the output 

thx. Sven 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Jared David Baker <Jared.Baker at uwyo.edu> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/29/2014 12:27 PM 
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is 
all SAS attached. Two servers which can see the multipath LUNS for 
failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

 8       48 31251951616 sdd
 8       32 31251951616 sdc
 8       80 31251951616 sdf
 8       16 31251951616 sdb
 8      128 31251951616 sdi
 8      112 31251951616 sdh
 8       96 31251951616 sdg
 8      192 31251951616 sdm
 8      240 31251951616 sdp
 8      208 31251951616 sdn
 8      144 31251951616 sdj
 8       64 31251951616 sde
 8      224 31251951616 sdo
 8      160 31251951616 sdk
 8      176 31251951616 sdl
65        0 31251951616 sdq
65       48 31251951616 sdt
65       16 31251951616 sdr
65      128  584960000 sdy
65       80 31251951616 sdv
65       96 31251951616 sdw
65       64 31251951616 sdu
65      112 31251951616 sdx
65       32 31251951616 sds
 8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Can you see the block devices from inside the OS after the reboot?  I 
don't see where you mention this.  How is the storage attached to the 
server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes 
share the storage?  All nsds in same failure group?     I was quickly 
brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly 
updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one 
day, dm-something else another), so that is no problem.  But wacking the 
volume label is a pain. 
When hardware dies if you have nsds sharing the same LUNs you can just 
transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org 
[gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker 
[Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I 
experienced yesterday. I've been working with GPFS for a couple months as 
an admin now, but I've come across a problem that I'm unable to see the 
answer to. Hopefully the solution is not listed somewhere blatantly on the 
web, but I spent a fair amount of time looking last night. Here is the 
situation: yesterday, I needed to update some firmware on a Mellanox HCA 
FDR14 card and reboot one of our GPFS servers and repeat for the sister 
node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, 
upon reboot, the server seemed to lose the path mappings to the multipath 
devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini   (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini   (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini   (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini   (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini  (not 
found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm 
unable to mount the GPFS filesystem. The disk names look like they are 
there and mapped to the NSD volume ID, but there is no Device. I've 
created the /var/mmfs/etc/nsddevices script and it has the following 
output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went 
digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. 
No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. 
No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to 
rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while 
...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no 
success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name  Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd5.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd5.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini 
          (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will 
recreating my NSDs destroy the filesystem? I'm thinking that all the data 
is intact, but there is no crucial data on this file system yet, so I 
could recreate the file system, but I would like to learn how to solve a 
problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/70ca7229/attachment-0002.htm>

From Jared.Baker at uwyo.edu  Wed Oct 29 20:30:29 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 20:30:29 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
Message-ID: <fdd5ef1e6e4d4444a49655c2f28d2f09@CY1PR0501MB1259.namprd05.prod.outlook.com>

Thanks Sven, I appreciate the feedback. I'll be opening the PMR soon.

Again, thanks for the information.

Best, Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:25 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Hi,

based on what i see is your BIOS or FW update wiped the NSD descriptor by restoring a GPT table on the start of a disk that shouldn't have a GPT table to begin with as its under control of GPFS.
future releases of GPFS prevent this by writing our own GPT label to the disks so other tools don't touch them, but that doesn't help in your case any more. if you want this officially confirmed i would still open a PMR, but at that point given that you don't seem to have any production data on it from what i see in your response you should recreate the filesystem.

Sven

------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 01:13 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Apologies Sven, w/o comments below:

--
#!/bin/ksh
CONTROLLER_REGEX='[ab]_lun[0-9]+'
for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX )
do
   echo mapper/$dev dmm
   #echo mapper/$dev generic
done

# Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover),
return 0
--

Best, Jared

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:03 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Hi,

i was asking for the content, not the result :-)

can you run cat /var/mmfs/etc/nsddevices

the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI

on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system :


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings
T7$V
e2d2s08
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s

while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate.


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:46 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Sven, output below:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--

Thanks, Jared

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

can you please post the content of your nsddevices script ?

also please run

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

8       48 31251951616 sdd
8       32 31251951616 sdc
8       80 31251951616 sdf
8       16 31251951616 sdb
8      128 31251951616 sdi
8      112 31251951616 sdh
8       96 31251951616 sdg
8      192 31251951616 sdm
8      240 31251951616 sdp
8      208 31251951616 sdn
8      144 31251951616 sdj
8       64 31251951616 sde
8      224 31251951616 sdo
8      160 31251951616 sdk
8      176 31251951616 sdl
65        0 31251951616 sdq
65       48 31251951616 sdt
65       16 31251951616 sdr
65      128  584960000 sdy
65       80 31251951616 sdv
65       96 31251951616 sdw
65       64 31251951616 sdu
65      112 31251951616 sdx
65       32 31251951616 sds
8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/c07e505f/attachment-0002.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 20:32:25 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 20:32:25 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
Message-ID: <54514ED9.9030604@buzzard.me.uk>

On 29/10/14 20:25, Sven Oehme wrote:
> Hi,
>
> based on what i see is your BIOS or FW update wiped the NSD descriptor
> by restoring a GPT table on the start of a disk that shouldn't have a
> GPT table to begin with as its under control of GPFS.
> future releases of GPFS prevent this by writing our own GPT label to the
> disks so other tools don't touch them, but that doesn't help in your
> case any more. if you want this officially confirmed i would still open
> a PMR, but at that point given that you don't seem to have any
> production data on it from what i see in your response you should
> recreate the filesystem.
>

However before recreating the file system I would run the script to see 
if your disks have the secondary copy of the GPT partition table and if 
they do make sure it is wiped/removed *BEFORE* you go any further. 
Otherwise it could happen again...

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From Jared.Baker at uwyo.edu  Wed Oct 29 20:47:51 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 20:47:51 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <54514ED9.9030604@buzzard.me.uk>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
	<54514ED9.9030604@buzzard.me.uk>
Message-ID: <e97cc5ab29a0476f95fe1db739f8eea7@CY1PR0501MB1259.namprd05.prod.outlook.com>

Jonathan, which script are you talking about?

Thanks, Jared

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jonathan Buzzard
Sent: Wednesday, October 29, 2014 2:32 PM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

On 29/10/14 20:25, Sven Oehme wrote:
> Hi,
>
> based on what i see is your BIOS or FW update wiped the NSD descriptor
> by restoring a GPT table on the start of a disk that shouldn't have a
> GPT table to begin with as its under control of GPFS.
> future releases of GPFS prevent this by writing our own GPT label to the
> disks so other tools don't touch them, but that doesn't help in your
> case any more. if you want this officially confirmed i would still open
> a PMR, but at that point given that you don't seem to have any
> production data on it from what i see in your response you should
> recreate the filesystem.
>

However before recreating the file system I would run the script to see 
if your disks have the secondary copy of the GPT partition table and if 
they do make sure it is wiped/removed *BEFORE* you go any further. 
Otherwise it could happen again...

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From jonathan at buzzard.me.uk  Wed Oct 29 21:01:06 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 21:01:06 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <e97cc5ab29a0476f95fe1db739f8eea7@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>	<54514ED9.9030604@buzzard.me.uk>
	<e97cc5ab29a0476f95fe1db739f8eea7@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <54515592.4050606@buzzard.me.uk>

On 29/10/14 20:47, Jared David Baker wrote:
> Jonathan, which script are you talking about?
>

The one here

https://www.ibm.com/developerworks/community/forums/html/topic?id=32296bac-bfa1-45ff-9a43-08b0a36b17ef&ps=25

Use for detecting and clearing that secondary GPT table. Never used it 
of course, my disaster was caused by an idiot admin installing a new OS 
not mapping the disks out and then hit yes yes yes when asked if he 
wanted to blank the disks, the RHEL installer duly obliged. Then five 
days later I rebooted the last NSD server for an upgrade and BOOM 50TB 
and 80 million files down the swanny.


JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From mark.bergman at uphs.upenn.edu  Fri Oct 31 17:10:55 2014
From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu)
Date: Fri, 31 Oct 2014 13:10:55 -0400
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
Message-ID: <25152-1414775455.156309@Pc2q.WYui.XCNm>

Many GPFS logs & utilities refer to nodes via their <cXnY> name.

I haven't found an "mm*" executable that shows the mapping between that
name an the hostname.

Is there a simple method to map the <cXnY> designation to the node's
hostname?

Thanks,

Mark


From bevans at pixitmedia.com  Fri Oct 31 17:32:45 2014
From: bevans at pixitmedia.com (Barry Evans)
Date: Fri, 31 Oct 2014 17:32:45 +0000
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm>
References: <25152-1414775455.156309@Pc2q.WYui.XCNm>
Message-ID: <5453C7BD.8030608@pixitmedia.com>

I'm sure there is a better way to do this, but old habits die hard. I 
tend to use 'mmfsadm saferdump tscomm' - connection details should be 
littered throughout.

Cheers,
Barry
ArcaStream/Pixit Media


mark.bergman at uphs.upenn.edu wrote:
> Many GPFS logs&  utilities refer to nodes via their<cXnY>  name.
>
> I haven't found an "mm*" executable that shows the mapping between that
> name an the hostname.
>
> Is there a simple method to map the<cXnY>  designation to the node's
> hostname?
>
> Thanks,
>
> Mark
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 

This email is confidential in that it is intended for the exclusive 
attention of the addressee(s) indicated. If you are not the intended 
recipient, this email should not be read or disclosed to any other person. 
Please notify the sender immediately and delete this email from your 
computer system. Any opinions expressed are not necessarily those of the 
company from which this email was sent and, whilst to the best of our 
knowledge no viruses or defects exist, no responsibility can be accepted 
for any loss or damage arising from its receipt or subsequent use of this 
email.


From oehmes at us.ibm.com  Fri Oct 31 18:20:40 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Fri, 31 Oct 2014 11:20:40 -0700
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm>
References: <25152-1414775455.156309@Pc2q.WYui.XCNm>
Message-ID: <OFC9042F14.E53F6CAF-ON88257D82.0064B734-88257D82.0064C4CB@us.ibm.com>

Hi,

the official way to do this is mmdiag --network 

thx. Sven


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   mark.bergman at uphs.upenn.edu
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/31/2014 10:11 AM
Subject:        [gpfsug-discuss] mapping <cXnY> to hostname?
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Many GPFS logs & utilities refer to nodes via their <cXnY> name.

I haven't found an "mm*" executable that shows the mapping between that
name an the hostname.

Is there a simple method to map the <cXnY> designation to the node's
hostname?

Thanks,

Mark

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141031/35713187/attachment-0002.htm>

From mark.bergman at uphs.upenn.edu  Fri Oct 31 18:57:44 2014
From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu)
Date: Fri, 31 Oct 2014 14:57:44 -0400
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
In-Reply-To: Your message of "Fri, 31 Oct 2014 11:20:40 -0700."
	<OFC9042F14.E53F6CAF-ON88257D82.0064B734-88257D82.0064C4CB@us.ibm.com>
References: <OFC9042F14.E53F6CAF-ON88257D82.0064B734-88257D82.0064C4CB@us.ibm.com>
	<25152-1414775455.156309@Pc2q.WYui.XCNm>
Message-ID: <9586-1414781864.388104@tEdB.dMla.tGDi>

In the message dated: Fri, 31 Oct 2014 11:20:40 -0700,
The pithy ruminations from Sven Oehme on 
<Re: [gpfsug-discuss] mapping <cXnY> to hostname?> were:
=> Hi,
=> 
=> the official way to do this is mmdiag --network 

OK.

I'm now using:

	mmdiag --network | awk '{if ( $1 ~ /<c[0-9]*n/ ) { printf $1 " " ; system("getent hosts "$2) }}'


Thanks,

Mark

=> 
=> thx. Sven
=> 
=> 
=> ------------------------------------------
=> Sven Oehme 
=> Scalable Storage Research 
=> email: oehmes at us.ibm.com 
=> Phone: +1 (408) 824-8904 
=> IBM Almaden Research Lab 
=> ------------------------------------------
=> 
=> 
=> 
=> From:   mark.bergman at uphs.upenn.edu
=> To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
=> Date:   10/31/2014 10:11 AM
=> Subject:        [gpfsug-discuss] mapping <cXnY> to hostname?
=> Sent by:        gpfsug-discuss-bounces at gpfsug.org
=> 
=> 
=> 
=> Many GPFS logs & utilities refer to nodes via their <cXnY> name.
=> 
=> I haven't found an "mm*" executable that shows the mapping between that
=> name an the hostname.
=> 
=> Is there a simple method to map the <cXnY> designation to the node's
=> hostname?
=> 
=> Thanks,
=> 
=> Mark
=> 


From stuartb at 4gh.net  Fri Oct  3 18:19:08 2014
From: stuartb at 4gh.net (Stuart Barkley)
Date: Fri, 3 Oct 2014 13:19:08 -0400 (EDT)
Subject: [gpfsug-discuss]  filesets and mountpoint naming
Message-ID: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>

Resent: First copy sent Sept 23.  Maybe stuck in a moderation queue?

When we first started using GPFS we created several filesystems and
just directly mounted them where seemed appropriate.  We have
something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome
and are looking at using filesets inside fewer filesystems to
accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user
experience, but I'm looking for suggestions as to where to mount the
actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one
cluster has a traditional gpfs system and a separate gss system which
will obviously need multiple mount points.  We also want to consider
possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you
do differently?

Thanks,
Stuart
-- 
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone


From bbanister at jumptrading.com  Mon Oct  6 16:17:44 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Mon, 6 Oct 2014 15:17:44 +0000
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>
References: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com>

There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang.

Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment.  Hope that helps,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
Sent: Friday, October 03, 2014 12:19 PM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] filesets and mountpoint naming

Resent: First copy sent Sept 23.  Maybe stuck in a moderation queue?

When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you do differently?

Thanks,
Stuart
--
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


From bbanister at jumptrading.com  Mon Oct  6 16:36:17 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Mon, 6 Oct 2014 15:36:17 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>

Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch -j FilesetName command (which specifies
a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in
parallel on a single fileset).

We were able to quickly create the "--home-inode-file" from the old file system using the mmapplypolicy command as the documentation describes.  However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.

Cheers,
-Bryan


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141006/24cbed89/attachment-0003.htm>

From Sandra.McLaughlin at astrazeneca.com  Mon Oct  6 16:40:45 2014
From: Sandra.McLaughlin at astrazeneca.com (McLaughlin, Sandra M)
Date: Mon, 6 Oct 2014 15:40:45 +0000
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
References: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
Message-ID: <5ed81d7bfbc94873aa804cfc807d5858@DBXPR04MB031.eurprd04.prod.outlook.com>

Hi Stuart,

We have a very similar setup. I use /gpfs01, /gpfs02 etc. and then use filesets within those, and symbolic links on the gpfs cluster members to give the same user experience combined with automounter maps (we have a large number of NFS clients as well as cluster members).  This all works quite well.

Regards, Sandra


--------------------------------------------------------------------------
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number: 03674842 and a registered office at 2 Kingdom Street, London, W2 6BD.
Confidentiality Notice: This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful.
Disclaimer: Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by AstraZeneca UK Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email.
Monitoring: AstraZeneca UK Limited may monitor email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking Compliance with our Code of Conduct and Policies.
-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
Sent: 23 September 2014 16:47
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] filesets and mountpoint naming

When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you do differently?

Thanks,
Stuart
--
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From zgiles at gmail.com  Mon Oct  6 16:42:56 2014
From: zgiles at gmail.com (Zachary Giles)
Date: Mon, 6 Oct 2014 11:42:56 -0400
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <alpine.BSF.2.11.1410031313470.24752@freeman.4gh.net>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CAMYZk=cdXgXpFE7pkYxk8VRa7_Ani0hsrz68Y_HjE3GR+4xsyQ@mail.gmail.com>

Here we have just one large GPFS file system with many file sets
inside. We mount it under /sc/something (sc for scientific computing).
We user the /sc/ as we previously had another GPFS file system while
migrating from one to the other. It's pretty easy and straight forward
to have just one file system.. eases administration and mounting.
You can make symlinks.. like /scratch -> /sc/something/scratch/ if you
want. We did that, and it's how most of our users got to the system
for a long time. We even remounted the GPFS file system from where DDN
left it at install time ( /gs01 ) to /sc/gs01, updated the symlink,
and the users never knew.

Multicluster for compute nodes separate from the FS cluster.

YMMV depending on if you want to allow everyone to mount your file
system or not. I know some people don't. We only admin our own boxes
and no one else does, so it works best this way for us given the ideal
scenario.


On Mon, Oct 6, 2014 at 11:17 AM, Bryan Banister
<bbanister at jumptrading.com> wrote:
> There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang.
>
> Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment.  Hope that helps,
> -Bryan
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
> Sent: Friday, October 03, 2014 12:19 PM
> To: gpfsug-discuss at gpfsug.org
> Subject: [gpfsug-discuss] filesets and mountpoint naming
>
> Resent: First copy sent Sept 23.  Maybe stuck in a moderation queue?
>
> When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:
>
>     /home
>     /scratch
>     /projects
>     /reference
>     /applications
>
> We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).
>
> We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.
>
> We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.
>
> Some thoughts are to just do filesystems as:
>
>     /gpfs01, /gpfs02, etc.
>     /mnt/gpfs01, etc
>     /mnt/clustera/gpfs01, etc.
>
> What have other people done?  Are you happy with it?  What would you do differently?
>
> Thanks,
> Stuart
> --
> I've never been lost; I was once bewildered for three days, but never lost!
>                                         --  Daniel Boone _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-- 
Zach Giles
zgiles at gmail.com


From oehmes at gmail.com  Mon Oct  6 17:27:58 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Mon, 6 Oct 2014 09:27:58 -0700
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:

>  Just an FYI to the GPFS user community,
>
>
>
> We have been testing out GPFS AFM file systems in our required process of
> file data migration between two GPFS file systems.  The two GPFS file
> systems are managed in two separate GPFS clusters.  We have a third GPFS
> cluster for compute systems.  We created new independent AFM filesets in
> the new GPFS file system that are linked to directories in the old file
> system.  Unfortunately access to the AFM filesets from the compute cluster
> completely hang.  Access to the other parts of the second file system is
> fine.  This limitation/issue is not documented in the Advanced Admin Guide.
>
>
>
> Further, we performed prefetch operations using a file mmafmctl command,
> but the process appears to be single threaded and the operation was
> extremely slow as a result.  According to the Advanced Admin Guide, it is
> not possible to run multiple prefetch jobs on the same fileset:
>
> GPFS can prefetch the data using the *mmafmctl **Device **prefetch ?j **FilesetName
> *command (which specifies
>
> a list of files to prefetch). Note the following about prefetching:
>
> v It can be run in parallel on multiple filesets (although more than one
> prefetching job cannot be run in
>
> parallel on a single fileset).
>
>
>
> We were able to quickly create the ?--home-inode-file? from the old file
> system using the mmapplypolicy command as the documentation describes.
> However the AFM prefetch operation is so slow that we are better off
> running parallel rsync operations between the file systems versus using the
> GPFS AFM prefetch operation.
>
>
>
> Cheers,
>
> -Bryan
>
>
>
> ------------------------------
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141006/37d054b7/attachment-0003.htm>

From bbanister at jumptrading.com  Mon Oct  6 17:30:02 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Mon, 6 Oct 2014 16:30:02 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>

We are using 4.1.0.3 on the cluster with the AFM filesets,
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies
a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in
parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes.  However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.

Cheers,
-Bryan


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141006/220cf9e0/attachment-0003.htm>

From kgunda at in.ibm.com  Tue Oct  7 06:03:07 2014
From: kgunda at in.ibm.com (Kalyan Gunda)
Date: Tue, 7 Oct 2014 10:33:07 +0530
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>

Hi Bryan,
 AFM supports GPFS multi-cluster..and we have customers already using this
successfully.  Are you using GPFS backend?
Can you explain your configuration in detail and if ls is hung it would
have generated some long waiters.  Maybe this should be pursued separately
via PMR.  You can ping me the details directly if needed along with opening
a PMR per IBM service process.

As for as prefetch is concerned, right now its limited to  one prefetch job
per fileset.  Each job in itself is multi-threaded and can use multi-nodes
to pull in data based on configuration.
"afmNumFlushThreads" tunable controls the number of threads used by AFM.
This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't
show this param for some reason, I will have that updated.)

eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5
Fileset prefetchIW changed.

List the change:
 mmlsfileset fs1 prefetchIW --afm -L
Filesets in file system 'fs1':

Attributes for fileset prefetchIW:
===================================
Status                                  Linked
Path                                    /gpfs/fs1/prefetchIW
Id                                      36
afm-associated                          Yes
Target
nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch
Mode                                    independent-writer
File Lookup Refresh Interval            30 (default)
File Open Refresh Interval              30 (default)
Dir Lookup Refresh Interval             60 (default)
Dir Open Refresh Interval               60 (default)
Async Delay                             15 (default)
Last pSnapId                            0
Display Home Snapshots                  no
Number of Gateway Flush Threads         5
Prefetch Threshold                      0 (default)
Eviction Enabled                        yes (default)

AFM parallel i/o can be setup such that multiple GW nodes can be used to
pull in data..more details are available here
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm

and this link outlines tuning params for parallel i/o along with others:
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning

Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:	Bryan Banister <bbanister at jumptrading.com>
To:	gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:	10/06/2014 09:57 PM
Subject:	Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:	gpfsug-discuss-bounces at gpfsug.org


We are using 4.1.0.3 on the cluster with the AFM filesets,
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster
environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of
file data migration between two GPFS file systems.  The two GPFS file
systems are managed in two separate GPFS clusters.  We have a third GPFS
cluster for compute systems.  We created new independent AFM filesets in
the new GPFS file system that are linked to directories in the old file
system.  Unfortunately access to the AFM filesets from the compute cluster
completely hang.  Access to the other parts of the second file system is
fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command,
but the process appears to be single threaded and the operation was
extremely slow as a result.  According to the Advanced Admin Guide, it is
not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j
FilesetName command (which specifies
a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one
prefetching job cannot be run in
parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file
system using the mmapplypolicy command as the documentation describes.
However the AFM prefetch operation is so slow that we are better off
running parallel rsync operations between the file systems versus using the
GPFS AFM prefetch operation.

Cheers,
-Bryan


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From bbanister at jumptrading.com  Tue Oct  7 15:44:48 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Tue, 7 Oct 2014 14:44:48 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>

Interesting that AFM is supposed to work in a multi-cluster environment.  We were using GPFS on the backend.  The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount.   The "gateway" nodes in the new cluster mounted the old file system.  All systems were connected over the same QDR IB fabric.  The client compute nodes in the third cluster mounted both the old and new file systems.  I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed.  I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable.  There were no error messages in any GPFS or system logs related to the `ls` fail.  NFS clients accessing cNFS servers in the new cluster also worked as expected.  The `ls` from the NFS client in an AFM fileset returned the expected directory listing.  Thus all symptoms indicated the configuration wasn't supported.  I may try to replicate the problem in a test environment at some point.

However AFM isn't really a great solution for file data migration between file systems for these reasons:
1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data).  No way to have it simply keep the two namespaces in sync.  And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system.  This is concerning and it is not easy to detect that this event has occurred.

2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME!  And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES!

3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration.  Multiple threads from a single node just doesn't cut it due to single node bandwidth limits.  When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second.  The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list!

4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration.  Yes, there is of course efforts to breakup the namespace for each rsync operations.  This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired.

AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN.

Thanks,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda
Sent: Tuesday, October 07, 2014 12:03 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations

Hi Bryan,
 AFM supports GPFS multi-cluster..and we have customers already using this successfully.  Are you using GPFS backend?
Can you explain your configuration in detail and if ls is hung it would have generated some long waiters.  Maybe this should be pursued separately via PMR.  You can ping me the details directly if needed along with opening a PMR per IBM service process.

As for as prefetch is concerned, right now its limited to  one prefetch job per fileset.  Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration.
"afmNumFlushThreads" tunable controls the number of threads used by AFM.
This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.)

eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed.

List the change:
 mmlsfileset fs1 prefetchIW --afm -L
Filesets in file system 'fs1':

Attributes for fileset prefetchIW:
===================================
Status                                  Linked
Path                                    /gpfs/fs1/prefetchIW
Id                                      36
afm-associated                          Yes
Target
nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch
Mode                                    independent-writer
File Lookup Refresh Interval            30 (default)
File Open Refresh Interval              30 (default)
Dir Lookup Refresh Interval             60 (default)
Dir Open Refresh Interval               60 (default)
Async Delay                             15 (default)
Last pSnapId                            0
Display Home Snapshots                  no
Number of Gateway Flush Threads         5
Prefetch Threshold                      0 (default)
Eviction Enabled                        yes (default)

AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm

and this link outlines tuning params for parallel i/o along with others:
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning

Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:   Bryan Banister <bbanister at jumptrading.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/06/2014 09:57 PM
Subject:        Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:        gpfsug-discuss-bounces at gpfsug.org


We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan

From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching:
v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes.
However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.

Cheers,
-Bryan


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

From kgunda at in.ibm.com  Tue Oct  7 16:20:30 2014
From: kgunda at in.ibm.com (Kalyan Gunda)
Date: Tue, 7 Oct 2014 20:50:30 +0530
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <OF63B850E2.11E81404-ON65257D6A.005322A1-65257D6A.00544690@in.ibm.com>

some clarifications inline:

Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:	Bryan Banister <bbanister at jumptrading.com>
To:	gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:	10/07/2014 08:12 PM
Subject:	Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:	gpfsug-discuss-bounces at gpfsug.org


Interesting that AFM is supposed to work in a multi-cluster environment.
We were using GPFS on the backend.  The new GPFS file system was AFM linked
over GPFS protocol to the old GPFS file system using the standard
multi-cluster mount.   The "gateway" nodes in the new cluster mounted the
old file system.  All systems were connected over the same QDR IB fabric.
The client compute nodes in the third cluster mounted both the old and new
file systems.  I looked for waiters on the client and NSD servers of the
new file system when the problem occurred, but none existed.  I tried
stracing the `ls` process, but it reported nothing and the strace itself
become unkillable.  There were no error messages in any GPFS or system logs
related to the `ls` fail.  NFS clients accessing cNFS servers in the new
cluster also worked as expected.  The `ls` from the NFS client in an AFM
fileset returned the expected directory listing.  Thus all symptoms
indicated the configuration wasn't supported.  I may try to replicate the
problem in a test environment at some point.

However AFM isn't really a great solution for file data migration between
file systems for these reasons:
1) It requires the complicated AFM setup, which requires manual operations
to sync data between the file systems (e.g. mmapplypolicy run on old file
system to get file list THEN mmafmctl prefetch operation on the new AFM
fileset to pull data).  No way to have it simply keep the two namespaces in
sync.  And you must be careful with the "Local Update" configuration not to
modify basically ANY file attributes in the new AFM fileset until a CLEAN
cutover of your application is performed, otherwise AFM will remove the
link of the file to data stored on the old file system.  This is concerning
and it is not easy to detect that this event has occurred.

--> The LU mode is meant for scenarios where changes in cache are not meant
to be pushed back to old filesystem.  If thats not whats desired then other
AFM modes like IW can be used to keep namespace in sync and data can flow
from both sides.  Typically, for data migration --metadata-only to pull in
the full namespace first and data can be migrated on demand or via policy
as outlined above using prefetch cmd.  AFM setup should be extension to
GPFS multi-cluster setup when using GPFS backend.

2) The "Progressive migration with no downtime" directions actually states
that there is downtime required to move applications to the new cluster,
THUS DOWNTIME!  And it really requires a SECOND downtime to finally disable
AFM on the file set so that there is no longer a connection to the old file
system, THUS TWO DOWNTIMES!
--> I am not sure I follow the first downtime.  If applications have to
start using the new filesystem, then they have to be informed accordingly.
If this can be done without bringing down applications, then there is no
DOWNTIME.
Regarding, second downtime, you are right, disabling AFM after data
migration requires unlink and hence downtime.  But there is a easy
workaround, where revalidation intervals can be increased to max or GW
nodes can be unconfigured without downtime with same effect.  And disabling
AFM can be done at a later point during maintenance window.  We plan to
modify this to have this done online aka without requiring unlink of the
fileset.  This will get prioritized if there is enough interest in AFM
being used in this direction.

3) The prefetch operation can only run on a single node thus is not able to
take any advantage of the large number of NSD servers supporting both file
systems for the data migration.  Multiple threads from a single node just
doesn't cut it due to single node bandwidth limits.  When I was running the
prefetch it was only executing roughly 100 " Queue numExec" operations per
second.  The prefetch operation for a directory with 12 Million files was
going to take over 33 HOURS just to process the file list!
--> Prefetch can run on multiple nodes by configuring multiple GW nodes and
enabling parallel i/o as specified in the docs..link provided below.
Infact it can parallelize data xfer to a single file and also do multiple
files in parallel depending on filesizes and various tuning params.

4) In comparison, parallel rsync operations will require only ONE downtime
to run a final sync over MULTIPLE nodes in parallel at the time that
applications are migrated between file systems and does not require the
complicated AFM configuration.  Yes, there is of course efforts to breakup
the namespace for each rsync operations.  This is really what AFM should be
doing for us... chopping up the namespace intelligently and spawning
prefetch operations across multiple nodes in a configurable way to ensure
performance is met or limiting overall impact of the operation if desired.

--> AFM can be used for data migration without any downtime dictated by AFM
(see above) and it can infact use multiple threads on multiple nodes to do
parallel i/o.

AFM, however, is great for what it is intended to be, a cached data access
mechanism across a WAN.

Thanks,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda
Sent: Tuesday, October 07, 2014 12:03 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster
environment, slow prefetch operations

Hi Bryan,
 AFM supports GPFS multi-cluster..and we have customers already using this
successfully.  Are you using GPFS backend?
Can you explain your configuration in detail and if ls is hung it would
have generated some long waiters.  Maybe this should be pursued separately
via PMR.  You can ping me the details directly if needed along with opening
a PMR per IBM service process.

As for as prefetch is concerned, right now its limited to  one prefetch job
per fileset.  Each job in itself is multi-threaded and can use multi-nodes
to pull in data based on configuration.
"afmNumFlushThreads" tunable controls the number of threads used by AFM.
This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't
show this param for some reason, I will have that updated.)

eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW
changed.

List the change:
 mmlsfileset fs1 prefetchIW --afm -L
Filesets in file system 'fs1':

Attributes for fileset prefetchIW:
===================================
Status                                  Linked
Path                                    /gpfs/fs1/prefetchIW
Id                                      36
afm-associated                          Yes
Target
nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch
Mode                                    independent-writer
File Lookup Refresh Interval            30 (default)
File Open Refresh Interval              30 (default)
Dir Lookup Refresh Interval             60 (default)
Dir Open Refresh Interval               60 (default)
Async Delay                             15 (default)
Last pSnapId                            0
Display Home Snapshots                  no
Number of Gateway Flush Threads         5
Prefetch Threshold                      0 (default)
Eviction Enabled                        yes (default)

AFM parallel i/o can be setup such that multiple GW nodes can be used to
pull in data..more details are available here
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm


and this link outlines tuning params for parallel i/o along with others:
http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning


Regards
Kalyan
GPFS Development
EGL D Block, Bangalore


From:   Bryan Banister <bbanister at jumptrading.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/06/2014 09:57 PM
Subject:        Re: [gpfsug-discuss] AFM limitations in a multi-cluster
            environment, slow prefetch operations
Sent by:        gpfsug-discuss-bounces at gpfsug.org


We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan

From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Monday, October 06, 2014 11:28 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster
environment, slow prefetch operations

Hi Bryan,

in 4.1 AFM uses multiple threads for reading data, this was different in
3.5 . what version are you using ?

thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:
Just an FYI to the GPFS user community,

We have been testing out GPFS AFM file systems in our required process of
file data migration between two GPFS file systems.  The two GPFS file
systems are managed in two separate GPFS clusters.  We have a third GPFS
cluster for compute systems.  We created new independent AFM filesets in
the new GPFS file system that are linked to directories in the old file
system.  Unfortunately access to the AFM filesets from the compute cluster
completely hang.  Access to the other parts of the second file system is
fine.  This limitation/issue is not documented in the Advanced Admin Guide.

Further, we performed prefetch operations using a file mmafmctl command,
but the process appears to be single threaded and the operation was
extremely slow as a result.  According to the Advanced Admin Guide, it is
not possible to run multiple prefetch jobs on the same fileset:
GPFS can prefetch the data using the mmafmctl Device prefetch ?j
FilesetName command (which specifies a list of files to prefetch). Note the
following about prefetching:
v It can be run in parallel on multiple filesets (although more than one
prefetching job cannot be run in parallel on a single fileset).

We were able to quickly create the ?--home-inode-file? from the old file
system using the mmapplypolicy command as the documentation describes.
However the AFM prefetch operation is so slow that we are better off
running parallel rsync operations between the file systems versus using the
GPFS AFM prefetch operation.

Cheers,
-Bryan


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From sdinardo at ebi.ac.uk  Thu Oct  9 13:02:44 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Thu, 09 Oct 2014 13:02:44 +0100
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
Message-ID: <54367964.1050900@ebi.ac.uk>

Hello everyone,

Suppose we want to build a new GPFS storage using SAN attached storages, 
but instead to put metadata in a shared storage, we want to use  
FusionIO PCI cards locally on the servers to speed up metadata 
operation( http://www.fusionio.com/products/iodrive) and for 
reliability, replicate the metadata in all the servers, will this work 
in case of  server failure?

To make it more clear: If a server fail i will loose also a metadata 
vdisk. Its the replica mechanism its reliable enough to avoid metadata 
corruption and loss of data?

Thanks in advance
Salvatore Di Nardo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141009/010abbe4/attachment-0003.htm>

From bbanister at jumptrading.com  Thu Oct  9 20:31:28 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 9 Oct 2014 19:31:28 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>

Just wanted to pass my GPFS RFE along:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458


Description:

GPFS File System Manager should provide the option to log all file and directory operations that occur in a file system, preferably stored in a TSD (Time Series Database) that could be quickly queried through an API interface and command line tools.    This would allow many required file system management operations to obtain the change log of a file system namespace without having to use the GPFS ILM policy engine to search all file system metadata for changes, and would not need to run massive differential comparisons of file system namespace snapshots to determine what files have been modified, deleted, added, etc.

It would be doubly great if this could be controlled on a per-fileset bases.


Use case:

This could be used for a very large number of file system management applications, including:
1) SOBAR (Scale-Out Backup And Restore)
2) Data Security Auditing and Monitoring applications
3) Async Replication of namespace between GPFS file systems without the requirement of AFM, which must use ILM policies that add unnecessary workload to metadata resources.
4) Application file system access profiling

Please vote for it if you feel it would also benefit your operation, thanks,
-Bryan


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141009/430cce16/attachment-0003.htm>

From service at metamodul.com  Fri Oct 10 13:21:43 2014
From: service at metamodul.com (service at metamodul.com)
Date: Fri, 10 Oct 2014 14:21:43 +0200 (CEST)
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <937639307.291563.1412943703119.JavaMail.open-xchange@oxbaltgw12.schlund.de>

 
> Bryan Banister <bbanister at jumptrading.com> hat am 9. Oktober 2014 um 21:31
> geschrieben:
> 
> 
>  Just wanted to pass my GPFS RFE along:
> 
>  http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
> <http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458>
> 

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."

Cheers
Hajo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/5ad0dff4/attachment-0003.htm>

From pgp at psu.edu  Fri Oct 10 16:04:02 2014
From: pgp at psu.edu (Phil Pishioneri)
Date: Fri, 10 Oct 2014 11:04:02 -0400
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <5437F562.1080609@psu.edu>

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and 
> directory operations that occur in a file system, preferably stored in 
> a TSD (Time Series Database) that could be quickly queried through an 
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS 
(used by the TSM HSM product). A while ago this was posted to the IBM 
GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil


From bbanister at jumptrading.com  Fri Oct 10 16:08:04 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 10 Oct 2014 15:08:04 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <5437F562.1080609@psu.edu>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>

Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


From bdeluca at gmail.com  Fri Oct 10 16:26:40 2014
From: bdeluca at gmail.com (Ben De Luca)
Date: Fri, 10 Oct 2014 23:26:40 +0800
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>

Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com>
wrote:

> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
> option.  Has anybody done this already and are there any examples we could
> look at?
>
> Thanks!
> -Bryan
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
> Sent: Friday, October 10, 2014 10:04 AM
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>
> On 10/9/14 3:31 PM, Bryan Banister wrote:
> >
> > Just wanted to pass my GPFS RFE along:
> >
> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> > 0458
> >
> >
> > *Description*:
> >
> > GPFS File System Manager should provide the option to log all file and
> > directory operations that occur in a file system, preferably stored in
> > a TSD (Time Series Database) that could be quickly queried through an
> > API interface and command line tools.  ...
> >
>
> The rudimentaries for this already exist via the DMAPI interface in GPFS
> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
> DeveloperWorks forum:
>
> On 1/3/11 10:27 AM, dWForums wrote:
> > Author:
> > AlokK.Dhir
> >
> > Message:
> > We have a proof of concept which uses DMAPI to listens to and passively
> logs filesystem changes with a non blocking listener.  This log can be used
> to generate backup sets etc.  Unfortunately, a bug in the current DMAPI
> keeps this approach from working in the case of certain events.  I am told
> 3.4.0.3 may contain a fix.  We will gladly share the code once it is
> working.
>
> -Phil
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/d32dbb50/attachment-0003.htm>

From oehmes at gmail.com  Fri Oct 10 16:51:51 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Fri, 10 Oct 2014 08:51:51 -0700
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
Message-ID: <CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>

Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is
here :
https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com> wrote:

> Id like this to see hot files
>
> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <
> bbanister at jumptrading.com> wrote:
>
>> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
>> option.  Has anybody done this already and are there any examples we could
>> look at?
>>
>> Thanks!
>> -Bryan
>>
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
>> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
>> Sent: Friday, October 10, 2014 10:04 AM
>> To: gpfsug main discussion list
>> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>>
>> On 10/9/14 3:31 PM, Bryan Banister wrote:
>> >
>> > Just wanted to pass my GPFS RFE along:
>> >
>> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
>> > 0458
>> >
>> >
>> > *Description*:
>> >
>> > GPFS File System Manager should provide the option to log all file and
>> > directory operations that occur in a file system, preferably stored in
>> > a TSD (Time Series Database) that could be quickly queried through an
>> > API interface and command line tools.  ...
>> >
>>
>> The rudimentaries for this already exist via the DMAPI interface in GPFS
>> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
>> DeveloperWorks forum:
>>
>> On 1/3/11 10:27 AM, dWForums wrote:
>> > Author:
>> > AlokK.Dhir
>> >
>> > Message:
>> > We have a proof of concept which uses DMAPI to listens to and passively
>> logs filesystem changes with a non blocking listener.  This log can be used
>> to generate backup sets etc.  Unfortunately, a bug in the current DMAPI
>> keeps this approach from working in the case of certain events.  I am told
>> 3.4.0.3 may contain a fix.  We will gladly share the code once it is
>> working.
>>
>> -Phil
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> ________________________________
>>
>> Note: This email is for the confidential use of the named addressee(s)
>> only and may contain proprietary, confidential or privileged information.
>> If you are not the intended recipient, you are hereby notified that any
>> review, dissemination or copying of this email is strictly prohibited, and
>> to please notify the sender immediately and destroy this email and any
>> attachments. Email transmission cannot be guaranteed to be secure or
>> error-free. The Company, therefore, does not make any guarantees as to the
>> completeness or accuracy of this email or any attachments. This email is
>> for informational purposes only and does not constitute a recommendation,
>> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
>> or perform any type of transaction of a financial product.
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/4ca468f9/attachment-0003.htm>

From Paul.Sanchez at deshaw.com  Fri Oct 10 17:02:09 2014
From: Paul.Sanchez at deshaw.com (Sanchez, Paul)
Date: Fri, 10 Oct 2014 16:02:09 +0000
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
In-Reply-To: <54367964.1050900@ebi.ac.uk>
References: <54367964.1050900@ebi.ac.uk>
Message-ID: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>

Hi Salvatore,

We've done this before (non-shared metadata NSDs with GPFS 4.1) and noted these constraints:

* Filesystem descriptor quorum: since it will be easier to have a metadata disk go offline, it's even more important to have three failure groups with FusionIO metadata NSDs in two, and at least a desc_only NSD in the third one. You may even want to explore having three full metadata replicas on FusionIO. (Or perhaps if your workload can tolerate it the third one can be slower but in another GPFS "subnet" so that it isn't used for reads.)

* Make sure to set the correct default metadata replicas in your filesystem, corresponding to the number of metadata failure groups you set up. When a metadata server goes offline, it will take the metadata disks with it, and you want a replica of the metadata to be available.

* When a metadata server goes offline and comes back up (after a maintenance reboot, for example), the non-shared metadata disks will be stopped. Until those are brought back into a  well-known replicated state, you are at risk of a cluster-wide filesystem unmount if there is a subsequent metadata disk failure. But GPFS will continue to work, by default, allowing reads and writes against the remaining metadata replica. You must detect that disks are stopped (e.g. mmlsdisk) and restart them (e.g. with mmchdisk <fs> start ?a).

I haven't seen anyone "recommend" running non-shared disk like this, and I wouldn't do this for things which can't afford to go offline unexpectedly and require a little more operational attention. But it does appear to work.

Thx
Paul Sanchez


From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Salvatore Di Nardo
Sent: Thursday, October 09, 2014 8:03 AM
To: gpfsug main discussion list
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?

Hello everyone,

Suppose we want to build a new GPFS storage using SAN attached storages, but instead to put metadata in a shared storage, we want to use  FusionIO PCI cards locally on the servers to speed up metadata operation( http://www.fusionio.com/products/iodrive) and for reliability, replicate the metadata in all the servers, will this work in case of  server failure?

To make it more clear: If a server fail i will loose also a metadata vdisk. Its the replica mechanism its reliable enough to avoid metadata corruption and loss of data?

Thanks in advance
Salvatore Di Nardo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/f4da20dc/attachment-0003.htm>

From oester at gmail.com  Fri Oct 10 17:05:03 2014
From: oester at gmail.com (Bob Oesterlin)
Date: Fri, 10 Oct 2014 11:05:03 -0500
Subject: [gpfsug-discuss] GPFS File Heat
Message-ID: <CAMNdFvD5kqP7pzR3gL7Os3wo5Q9maRHCrYRSetEPYAggzGTXzA@mail.gmail.com>

As Sven suggests, this is easy to gather once you turn on file heat. I run
this heat.pol file against a file systems to gather the values:

-- heat.pol --

define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1)
END])

rule fh1 external list 'fh' exec ''
rule fh2 list 'fh' weight(FILE_HEAT) show( DISPLAY_NULL(FILE_HEAT) || '|'
|| varchar(file_size) )

-- heat.pol --

Produces output similar to this:

/gpfs/.../specFile.pyc 535089836 5892
/gpfs/.../syspath.py 528685287 806
/gpfs/---/bwe.py 528160670 4607

Actual GPFS file path redacted :)

After that it's a relatively straightforward process to go thru the values.
There is no documentation on what the values really mean, but it does give
you some overall indication of which files are getting the most hits.

I have other information to share; drop me a note at my work email:

robert.oesterlin at nuance.com

Bob Oesterlin
Sr Storage Engineer, Nuance Communications
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/6cea1102/attachment-0003.htm>

From bdeluca at gmail.com  Fri Oct 10 17:09:49 2014
From: bdeluca at gmail.com (Ben De Luca)
Date: Sat, 11 Oct 2014 00:09:49 +0800
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
Message-ID: <CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>

querying this through the policy engine is far to late to do any thing
useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com> wrote:

> Ben,
>
> to get lists of 'Hot Files' turn File Heat on , some discussion about it
> is here :
> https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653
>
> thx.  Sven
>
>
> On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com> wrote:
>
>> Id like this to see hot files
>>
>> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <
>> bbanister at jumptrading.com> wrote:
>>
>>> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
>>> option.  Has anybody done this already and are there any examples we could
>>> look at?
>>>
>>> Thanks!
>>> -Bryan
>>>
>>> -----Original Message-----
>>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
>>> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
>>> Sent: Friday, October 10, 2014 10:04 AM
>>> To: gpfsug main discussion list
>>> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>>>
>>> On 10/9/14 3:31 PM, Bryan Banister wrote:
>>> >
>>> > Just wanted to pass my GPFS RFE along:
>>> >
>>> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
>>> > 0458
>>> >
>>> >
>>> > *Description*:
>>> >
>>> > GPFS File System Manager should provide the option to log all file and
>>> > directory operations that occur in a file system, preferably stored in
>>> > a TSD (Time Series Database) that could be quickly queried through an
>>> > API interface and command line tools.  ...
>>> >
>>>
>>> The rudimentaries for this already exist via the DMAPI interface in GPFS
>>> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
>>> DeveloperWorks forum:
>>>
>>> On 1/3/11 10:27 AM, dWForums wrote:
>>> > Author:
>>> > AlokK.Dhir
>>> >
>>> > Message:
>>> > We have a proof of concept which uses DMAPI to listens to and
>>> passively logs filesystem changes with a non blocking listener.  This log
>>> can be used to generate backup sets etc.  Unfortunately, a bug in the
>>> current DMAPI keeps this approach from working in the case of certain
>>> events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the
>>> code once it is working.
>>>
>>> -Phil
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>> ________________________________
>>>
>>> Note: This email is for the confidential use of the named addressee(s)
>>> only and may contain proprietary, confidential or privileged information.
>>> If you are not the intended recipient, you are hereby notified that any
>>> review, dissemination or copying of this email is strictly prohibited, and
>>> to please notify the sender immediately and destroy this email and any
>>> attachments. Email transmission cannot be guaranteed to be secure or
>>> error-free. The Company, therefore, does not make any guarantees as to the
>>> completeness or accuracy of this email or any attachments. This email is
>>> for informational purposes only and does not constitute a recommendation,
>>> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
>>> or perform any type of transaction of a financial product.
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141011/982198d6/attachment-0003.htm>

From bbanister at jumptrading.com  Fri Oct 10 17:15:22 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 10 Oct 2014 16:15:22 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>

I agree with Ben, I think.

I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/3e5ecf5a/attachment-0003.htm>

From Paul.Sanchez at deshaw.com  Fri Oct 10 17:24:32 2014
From: Paul.Sanchez at deshaw.com (Sanchez, Paul)
Date: Fri, 10 Oct 2014 16:24:32 +0000
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
References: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
Message-ID: <201D6001C896B846A9CFC2E841986AC1451878D2@mailnycmb2a.winmail.deshaw.com>

We've been mounting all filesystems in a canonical location and bind mounting filesets into the namespace.  

One gotcha that we recently encountered though was the selection of /gpfs as the root of the canonical mount path.  (By default automountdir is set to /gpfs/automountdir, which made this seem like a good spot.)  This seems to be where gpfs expects filesystems to be mounted, since there are some hardcoded references in the gpfs.base RPM %pre script (RHEL package for GPFS) which try to nudge processes off of the filesystems before yanking the mounts during an RPM version upgrade.  This however may take an exceedingly long time, since it's doing an 'lsof +D /gpfs' which walks the filesystems.

-Paul Sanchez

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley
Sent: Tuesday, September 23, 2014 11:47 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] filesets and mountpoint naming

When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate.  We have something like:

    /home
    /scratch
    /projects
    /reference
    /applications

We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now).

We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems.

We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points.  We also want to consider possible future cross cluster mounts.

Some thoughts are to just do filesystems as:

    /gpfs01, /gpfs02, etc.
    /mnt/gpfs01, etc
    /mnt/clustera/gpfs01, etc.

What have other people done?  Are you happy with it?  What would you do differently?

Thanks,
Stuart
--
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oehmes at gmail.com  Fri Oct 10 17:52:27 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Fri, 10 Oct 2014 09:52:27 -0700
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>

The only DMAPI agent i am aware of is a prototype that was written by
tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support
the DMAPI interface , but the code that exposes the API is nothing we
provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:

>  I agree with Ben, I think.
>
>
>
> I don?t want to use the ILM policy engine as that puts a direct workload
> against the metadata storage and server resources.  We need something
> out-of-band, out of the file system operational path.
>
>
>
> Is there a simple DMAPI daemon that would log the file system namespace
> changes that we could use?
>
>
>
> If so are there any limitations?
>
>
>
> And is it possible to set this up in an HA environment?
>
>
>
> Thanks!
>
> -Bryan
>
>
>
> *From:* gpfsug-discuss-bounces at gpfsug.org [mailto:
> gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Ben De Luca
> *Sent:* Friday, October 10, 2014 11:10 AM
>
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
>
>
> querying this through the policy engine is far to late to do any thing
> useful with it
>
>
>
> On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com> wrote:
>
> Ben,
>
>
>
> to get lists of 'Hot Files' turn File Heat on , some discussion about it
> is here :
> https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653
>
>
>
> thx.  Sven
>
>
>
>
>
> On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com> wrote:
>
> Id like this to see hot files
>
>
>
> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <
> bbanister at jumptrading.com> wrote:
>
> Hmm... I didn't think to use the DMAPI interface.  That could be a nice
> option.  Has anybody done this already and are there any examples we could
> look at?
>
> Thanks!
> -Bryan
>
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:
> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri
> Sent: Friday, October 10, 2014 10:04 AM
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>
> On 10/9/14 3:31 PM, Bryan Banister wrote:
> >
> > Just wanted to pass my GPFS RFE along:
> >
> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> > 0458
> >
> >
> > *Description*:
> >
> > GPFS File System Manager should provide the option to log all file and
> > directory operations that occur in a file system, preferably stored in
> > a TSD (Time Series Database) that could be quickly queried through an
> > API interface and command line tools.  ...
> >
>
> The rudimentaries for this already exist via the DMAPI interface in GPFS
> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS
> DeveloperWorks forum:
>
> On 1/3/11 10:27 AM, dWForums wrote:
> > Author:
> > AlokK.Dhir
> >
> > Message:
> > We have a proof of concept which uses DMAPI to listens to and passively
> logs filesystem changes with a non blocking listener.  This log can be used
> to generate backup sets etc.  Unfortunately, a bug in the current DMAPI
> keeps this approach from working in the case of certain events.  I am told
> 3.4.0.3 may contain a fix.  We will gladly share the code once it is
> working.
>
> -Phil
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> ------------------------------
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/0961d1f4/attachment-0003.htm>

From bbanister at jumptrading.com  Fri Oct 10 18:13:16 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 10 Oct 2014 17:13:16 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>

A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted.  I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable.  I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!)

Thanks,
-Bryan

Ps. Hajo said that he couldn?t access the RFE to vote on it:

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."
Cheers
Hajo

Here is what the RFE website states:
Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
A unique URL that you can bookmark and share with others.

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Friday, October 10, 2014 11:52 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
I agree with Ben, I think.

I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM

To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/e60c8dfc/attachment-0003.htm>

From sdinardo at ebi.ac.uk  Sat Oct 11 10:37:10 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Sat, 11 Oct 2014 10:37:10 +0100
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
References: <54367964.1050900@ebi.ac.uk>
	<201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
Message-ID: <5438FA46.7090902@ebi.ac.uk>

Thanks for your answer.
Yes, the idea is to have 3 servers in 3 different failure groups. Each 
of them with a  drive and set 3 metadata replica as the default one.

I have not considered that the vdisks could be off after a 'reboot' or 
failure, so that's a good point, but anyway , after a failure or even a 
standard reboot, the server and the cluster have to be checked anyway, 
and i always check the vdisk status, so no big deal.

Your answer made me consider also another thing...  Once put them back 
online, they will be restriped automatically or should i run every time  
'mmrestripefs' to verify/correct the replicas?

I understand that use lodal disk sound strange, infact our first idea 
was just to add some ssd to the shared storage, but then we considered 
that the sas cable could be a huge bottleneck. The cost difference is 
not huge and the fusioio locally on the server would make the metadata 
just fly.


On 10/10/14 17:02, Sanchez, Paul wrote:
>
> Hi Salvatore,
>
> We've done this before (non-shared metadata NSDs with GPFS 4.1) and 
> noted these constraints:
>
> * Filesystem descriptor quorum: since it will be easier to have a 
> metadata disk go offline, it's even more important to have three 
> failure groups with FusionIO metadata NSDs in two, and at least a 
> desc_only NSD in the third one. You may even want to explore having 
> three full metadata replicas on FusionIO. (Or perhaps if your workload 
> can tolerate it the third one can be slower but in another GPFS 
> "subnet" so that it isn't used for reads.)
>
> * Make sure to set the correct default metadata replicas in your 
> filesystem, corresponding to the number of metadata failure groups you 
> set up. When a metadata server goes offline, it will take the metadata 
> disks with it, and you want a replica of the metadata to be available.
>
> * When a metadata server goes offline and comes back up (after a 
> maintenance reboot, for example), the non-shared metadata disks will 
> be stopped. Until those are brought back into a  well-known replicated 
> state, you are at risk of a cluster-wide filesystem unmount if there 
> is a subsequent metadata disk failure. But GPFS will continue to work, 
> by default, allowing reads and writes against the remaining metadata 
> replica. You must detect that disks are stopped (e.g. mmlsdisk) and 
> restart them (e.g. with mmchdisk <fs> start ?a).
>
> I haven't seen anyone "recommend" running non-shared disk like this, 
> and I wouldn't do this for things which can't afford to go offline 
> unexpectedly and require a little more operational attention. But it 
> does appear to work.
>
> Thx
> Paul Sanchez
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di 
> Nardo
> *Sent:* Thursday, October 09, 2014 8:03 AM
> *To:* gpfsug main discussion list
> *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable?
>
> Hello everyone,
>
> Suppose we want to build a new GPFS storage using SAN attached 
> storages, but instead to put metadata in a shared storage, we want to 
> use  FusionIO PCI cards locally on the servers to speed up metadata 
> operation( http://www.fusionio.com/products/iodrive) and for 
> reliability, replicate the metadata in all the servers, will this work 
> in case of  server failure?
>
> To make it more clear: If a server fail i will loose also a metadata 
> vdisk. Its the replica mechanism its reliable enough to avoid metadata 
> corruption and loss of data?
>
> Thanks in advance
> Salvatore Di Nardo
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141011/f580507a/attachment-0003.htm>

From service at metamodul.com  Sun Oct 12 17:03:56 2014
From: service at metamodul.com (MetaService)
Date: Sun, 12 Oct 2014 18:03:56 +0200
Subject: [gpfsug-discuss] filesets and mountpoint naming
In-Reply-To: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
References: <alpine.BSF.2.11.1409231127550.1507@freeman.4gh.net>
Message-ID: <1413129836.4846.9.camel@titan>

My preferred naming convention is to use the cluster name or part of it
as the base directory for all GPFS mounts.

Example: Clustername=c1_eum would mean that:

/c1_eum/

would be the base directory for all Cluster c1_eum GPFSs

In case a second local cluster would exist its root mount point would
be /c2_eum/

Even in case of mounting remote clusters a naming collision is not very
likely.

BTW: For accessing the the final directories /.../scratch ... the user
should not rely on the mount points but on given variables provided.

CLS_HOME=/...
CLS_SCRATCH=/....

hth
Hajo


From lhorrocks-barlow at ocf.co.uk  Fri Oct 10 17:48:24 2014
From: lhorrocks-barlow at ocf.co.uk (Laurence Horrocks- Barlow)
Date: Fri, 10 Oct 2014 17:48:24 +0100
Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable?
In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
References: <54367964.1050900@ebi.ac.uk>
	<201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com>
Message-ID: <54380DD8.2020909@ocf.co.uk>

Hi Salvatore,

Just to add that when the local metadata disk fails or the server goes 
offline there will most likely be an I/O interruption/pause whist the 
GPFS cluster renegotiates.

The main concept to be aware of (as Paul mentioned) is that when a disk 
goes offline it will appear down to GPFS, once you've started the disk 
again it will rediscover and scan the metadata for any missing updates, 
these updates are then repaired/replicated again.

Laurence Horrocks-Barlow
Linux Systems Software Engineer
OCF plc

Tel: +44 (0)114 257 2200
Fax: +44 (0)114 257 0022
Web: www.ocf.co.uk <http://www.ocf.co.uk>
Blog: blog.ocf.co.uk <http://blog.ocf.co.uk>
Twitter: @ocfplc <http://twitter.com/#%21/ocfplc>

OCF plc is a company registered in England and Wales. Registered number 
4132533, VAT number GB 780 6803 14. Registered office address: OCF plc, 
5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 
2PG.

This message is private and confidential. If you have received this 
message in error, please notify us and remove it from your system.


On 10/10/2014 17:02, Sanchez, Paul wrote:
>
> Hi Salvatore,
>
> We've done this before (non-shared metadata NSDs with GPFS 4.1) and 
> noted these constraints:
>
> * Filesystem descriptor quorum: since it will be easier to have a 
> metadata disk go offline, it's even more important to have three 
> failure groups with FusionIO metadata NSDs in two, and at least a 
> desc_only NSD in the third one. You may even want to explore having 
> three full metadata replicas on FusionIO. (Or perhaps if your workload 
> can tolerate it the third one can be slower but in another GPFS 
> "subnet" so that it isn't used for reads.)
>
> * Make sure to set the correct default metadata replicas in your 
> filesystem, corresponding to the number of metadata failure groups you 
> set up. When a metadata server goes offline, it will take the metadata 
> disks with it, and you want a replica of the metadata to be available.
>
> * When a metadata server goes offline and comes back up (after a 
> maintenance reboot, for example), the non-shared metadata disks will 
> be stopped. Until those are brought back into a  well-known replicated 
> state, you are at risk of a cluster-wide filesystem unmount if there 
> is a subsequent metadata disk failure. But GPFS will continue to work, 
> by default, allowing reads and writes against the remaining metadata 
> replica. You must detect that disks are stopped (e.g. mmlsdisk) and 
> restart them (e.g. with mmchdisk <fs> start ?a).
>
> I haven't seen anyone "recommend" running non-shared disk like this, 
> and I wouldn't do this for things which can't afford to go offline 
> unexpectedly and require a little more operational attention. But it 
> does appear to work.
>
> Thx
> Paul Sanchez
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di 
> Nardo
> *Sent:* Thursday, October 09, 2014 8:03 AM
> *To:* gpfsug main discussion list
> *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable?
>
> Hello everyone,
>
> Suppose we want to build a new GPFS storage using SAN attached 
> storages, but instead to put metadata in a shared storage, we want to 
> use  FusionIO PCI cards locally on the servers to speed up metadata 
> operation( http://www.fusionio.com/products/iodrive) and for 
> reliability, replicate the metadata in all the servers, will this work 
> in case of  server failure?
>
> To make it more clear: If a server fail i will loose also a metadata 
> vdisk. Its the replica mechanism its reliable enough to avoid metadata 
> corruption and loss of data?
>
> Thanks in advance
> Salvatore Di Nardo
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/8d4ab475/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lhorrocks-barlow.vcf
Type: text/x-vcard
Size: 388 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141010/8d4ab475/attachment-0003.vcf>

From kraemerf at de.ibm.com  Mon Oct 13 12:10:17 2014
From: kraemerf at de.ibm.com (Frank Kraemer)
Date: Mon, 13 Oct 2014 13:10:17 +0200
Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014,
	Duesseldorf, Germany
Message-ID: <OF53876A5D.2ACC8EFA-ONC1257D70.003D5DC4-C1257D70.003D5DC7@de.ibm.com>


GPFS at  LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany
Oct 14th 11:15-12:05 Room 18
http://sched.co/1uMYEWK

Frank Kraemer
IBM Consulting IT Specialist  / Client Technical Architect
Hechtsheimer Str. 2, 55131 Mainz
mailto:kraemerf at de.ibm.com
voice: +49171-3043699
IBM Germany


From service at metamodul.com  Mon Oct 13 16:49:44 2014
From: service at metamodul.com (service at metamodul.com)
Date: Mon, 13 Oct 2014 17:49:44 +0200 (CEST)
Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014,
 Duesseldorf, Germany
In-Reply-To: <OF53876A5D.2ACC8EFA-ONC1257D70.003D5DC4-C1257D70.003D5DC7@de.ibm.com>
References: <OF53876A5D.2ACC8EFA-ONC1257D70.003D5DC4-C1257D70.003D5DC7@de.ibm.com>
Message-ID: <994787708.574787.1413215384447.JavaMail.open-xchange@oxbaltgw12.schlund.de>

Hallo Frank,
the announcement is a little bit to late for me. Would be nice if you could
share your speech later.
 
cheers
Hajo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141013/cf4b67b2/attachment-0003.htm>

From sdinardo at ebi.ac.uk  Tue Oct 14 15:39:35 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 14 Oct 2014 15:39:35 +0100
Subject: [gpfsug-discuss] wait for permission to append to log
Message-ID: <543D35A7.7080800@ebi.ac.uk>

hello all,
could someone explain me the meaning of those waiters?

gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'

Does it means that the vdisk logs are struggling?

Regards,
Salvatore


From oehmes at us.ibm.com  Tue Oct 14 15:51:10 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 07:51:10 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <543D35A7.7080800@ebi.ac.uk>
References: <543D35A7.7080800@ebi.ac.uk>
Message-ID: <OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>

it means there is contention on inserting data into the fast write log on 
the GSS Node, which could be config or workload related
what GSS code version are you running and how are the nodes connected with 
each other (Ethernet or IB) ? 

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Salvatore Di Nardo <sdinardo at ebi.ac.uk>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/14/2014 07:40 AM
Subject:        [gpfsug-discuss] wait for permission to append to log
Sent by:        gpfsug-discuss-bounces at gpfsug.org


hello all,
could someone explain me the meaning of those waiters?

gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'

Does it means that the vdisk logs are struggling?

Regards,
Salvatore

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/63d72890/attachment-0003.htm>

From sdinardo at ebi.ac.uk  Tue Oct 14 16:23:01 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 14 Oct 2014 16:23:01 +0100
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
References: <543D35A7.7080800@ebi.ac.uk>
	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
Message-ID: <543D3FD5.1060705@ebi.ac.uk>


On 14/10/14 15:51, Sven Oehme wrote:
> it means there is contention on inserting data into the fast write log 
> on the GSS Node, which could be config or workload related
> what GSS code version are you running 

            [root at ebi5-251 ~]# mmdiag --version

            === mmdiag: version ===
            Current GPFS build: "3.5.0-11 efix1 (888041)".
            Built on Jul  9 2013 at 18:03:32
            Running 6 days 2 hours 10 minutes 35 secs


> and how are the nodes connected with each other (Ethernet or IB) ?
ethernet. they use the same bonding (4x10Gb/s) where the data is 
passing. We don't have admin dedicated network


            [root at gss03a ~]# mmlscluster

            GPFS cluster information
            ========================
               GPFS cluster name:         GSS.ebi.ac.uk
               GPFS cluster id:           17987981184946329605
               GPFS UID domain:           GSS.ebi.ac.uk
               Remote shell command:      /usr/bin/ssh
               Remote file copy command:  /usr/bin/scp

            GPFS cluster configuration servers:
            -----------------------------------
               Primary server:    gss01a.ebi.ac.uk
               Secondary server:  gss02b.ebi.ac.uk

              Node  Daemon node name    IP address  Admin node name
            Designation
            -----------------------------------------------------------------------
                1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk
            quorum-manager
                2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk
            quorum-manager
                3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk
            quorum-manager
                4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk
            quorum-manager
                5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk
            quorum-manager
                6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk
            quorum-manager


*Note:* The 3 node "pairs" (gss01, gss02 and gss03)  are in different 
subnet because of datacenter constraints ( They are not physically in 
the same row, and due to network constraints was not possible to put 
them in the same subnet). The packets are routed, but should not be a 
problem as there is 160Gb/s bandwidth between them.

Regards,
Salvatore


> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
>
>
> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
> Date: 10/14/2014 07:40 AM
> Subject: [gpfsug-discuss] wait for permission to append to log
> Sent by: gpfsug-discuss-bounces at gpfsug.org
> ------------------------------------------------------------------------
>
>
>
> hello all,
> could someone explain me the meaning of those waiters?
>
> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>
> Does it means that the vdisk logs are struggling?
>
> Regards,
> Salvatore
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/36a5bc7e/attachment-0003.htm>

From oehmes at us.ibm.com  Tue Oct 14 17:22:41 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 09:22:41 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <543D3FD5.1060705@ebi.ac.uk>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>
Message-ID: <OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>

your GSS code version is very backlevel. 

can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk 
as well as mmlsconfig and mmlsfs all

thx. Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Salvatore Di Nardo <sdinardo at ebi.ac.uk>
To:     gpfsug-discuss at gpfsug.org
Date:   10/14/2014 08:23 AM
Subject:        Re: [gpfsug-discuss] wait for permission to append to log
Sent by:        gpfsug-discuss-bounces at gpfsug.org


On 14/10/14 15:51, Sven Oehme wrote:
it means there is contention on inserting data into the fast write log on 
the GSS Node, which could be config or workload related 
what GSS code version are you running 
[root at ebi5-251 ~]# mmdiag --version

=== mmdiag: version ===
Current GPFS build: "3.5.0-11 efix1 (888041)".
Built on Jul  9 2013 at 18:03:32
Running 6 days 2 hours 10 minutes 35 secs


and how are the nodes connected with each other (Ethernet or IB) ? 
ethernet. they use the same bonding (4x10Gb/s) where the data is passing. 
We don't have admin dedicated network

[root at gss03a ~]# mmlscluster 

GPFS cluster information
========================
  GPFS cluster name:         GSS.ebi.ac.uk
  GPFS cluster id:           17987981184946329605
  GPFS UID domain:           GSS.ebi.ac.uk
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp

GPFS cluster configuration servers:
-----------------------------------
  Primary server:    gss01a.ebi.ac.uk
  Secondary server:  gss02b.ebi.ac.uk

 Node  Daemon node name    IP address  Admin node name     Designation
-----------------------------------------------------------------------
   1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk    quorum-manager
   2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk    quorum-manager
   3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk    quorum-manager
   4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk    quorum-manager
   5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk    quorum-manager
   6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk    quorum-manager


Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different subnet 
because of datacenter constraints ( They are not physically in the same 
row, and due to network constraints was not possible to put them in the 
same subnet). The packets are routed, but should not be a problem as there 
is 160Gb/s bandwidth between them.

Regards,
Salvatore


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/14/2014 07:40 AM 
Subject:        [gpfsug-discuss] wait for permission to append to log 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


hello all,
could someone explain me the meaning of those waiters?

gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'
gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
(VdiskLogAppendCondvar), reason 'wait for permission to append to log'

Does it means that the vdisk logs are struggling?

Regards,
Salvatore

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/a578f87a/attachment-0003.htm>

From sdinardo at ebi.ac.uk  Tue Oct 14 17:39:18 2014
From: sdinardo at ebi.ac.uk (Salvatore Di Nardo)
Date: Tue, 14 Oct 2014 17:39:18 +0100
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>	<543D3FD5.1060705@ebi.ac.uk>
	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
Message-ID: <543D51B6.3070602@ebi.ac.uk>

Thanks in advance for your help.

We have 6 RG:

              recovery group        vdisks     vdisks servers
              ------------------  -----------  ------  -------
              gss01a                        4       8
            gss01a.ebi.ac.uk,gss01b.ebi.ac.uk
              gss01b                        4       8
            gss01b.ebi.ac.uk,gss01a.ebi.ac.uk
              gss02a                        4       8
            gss02a.ebi.ac.uk,gss02b.ebi.ac.uk
              gss02b                        4       8
            gss02b.ebi.ac.uk,gss02a.ebi.ac.uk
              gss03a                        4       8
            gss03a.ebi.ac.uk,gss03b.ebi.ac.uk
              gss03b                        4       8
            gss03b.ebi.ac.uk,gss03a.ebi.ac.uk


Check the attached file for RG details.
Following mmlsconfig:

            [root at gss01a ~]# mmlsconfig
            Configuration data for cluster GSS.ebi.ac.uk:
            ---------------------------------------------
            myNodeConfigNumber 1
            clusterName GSS.ebi.ac.uk
            clusterId 17987981184946329605
            autoload no
            dmapiFileHandleSize 32
            minReleaseLevel 3.5.0.11
            [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
            pagepool 38g
            nsdRAIDBufferPoolSizePct 80
            maxBufferDescs 2m
            numaMemoryInterleave yes
            prefetchPct 5
            maxblocksize 16m
            nsdRAIDTracks 128k
            ioHistorySize 64k
            nsdRAIDSmallBufferSize 256k
            nsdMaxWorkerThreads 3k
            nsdMinWorkerThreads 3k
            nsdRAIDSmallThreadRatio 2
            nsdRAIDThreadsPerQueue 16
            nsdClientCksumTypeLocal ck64
            nsdClientCksumTypeRemote ck64
            nsdRAIDEventLogToConsole all
            nsdRAIDFastWriteFSDataLimit 64k
            nsdRAIDFastWriteFSMetadataLimit 256k
            nsdRAIDReconstructAggressiveness 1
            nsdRAIDFlusherBuffersLowWatermarkPct 20
            nsdRAIDFlusherBuffersLimitPct 80
            nsdRAIDFlusherTracksLowWatermarkPct 20
            nsdRAIDFlusherTracksLimitPct 80
            nsdRAIDFlusherFWLogHighWatermarkMB 1000
            nsdRAIDFlusherFWLogLimitMB 5000
            nsdRAIDFlusherThreadsLowWatermark 1
            nsdRAIDFlusherThreadsHighWatermark 512
            nsdRAIDBlockDeviceMaxSectorsKB 4096
            nsdRAIDBlockDeviceNrRequests 32
            nsdRAIDBlockDeviceQueueDepth 16
            nsdRAIDBlockDeviceScheduler deadline
            nsdRAIDMaxTransientStale2FT 1
            nsdRAIDMaxTransientStale3FT 1
            syncWorkerThreads 256
            tscWorkerPool 64
            nsdInlineWriteMax 32k
            maxFilesToCache 12k
            maxStatCache 512
            maxGeneralThreads 1280
            flushedDataTarget 1024
            flushedInodeTarget 1024
            maxFileCleaners 1024
            maxBufferCleaners 1024
            logBufferCount 20
            logWrapAmountPct 2
            logWrapThreads 128
            maxAllocRegionsPerNode 32
            maxBackgroundDeletionThreads 16
            maxInodeDeallocPrefetch 128
            maxMBpS 16000
            maxReceiverThreads 128
            worker1Threads 1024
            worker3Threads 32
            [common]
            cipherList AUTHONLY
            socketMaxListenConnections 1500
            failureDetectionTime 60
            [common]
            adminMode central

            File systems in cluster GSS.ebi.ac.uk:
            --------------------------------------
            /dev/gpfs1

For more configuration paramenters i also attached a file with the 
complete output of mmdiag --config.


and mmlsfs:


            File system attributes for /dev/gpfs1:
            ======================================
            flag                value                    description
            ------------------- ------------------------
            -----------------------------------
              -f                 32768                    Minimum
            fragment size in bytes (system pool)
                                 262144                   Minimum
            fragment size in bytes (other pools)
              -i                 512                      Inode size in
            bytes
              -I                 32768                    Indirect block
            size in bytes
              -m                 2                        Default number
            of metadata replicas
              -M                 2                        Maximum number
            of metadata replicas
              -r                 1                        Default number
            of data replicas
              -R                 2                        Maximum number
            of data replicas
              -j                 scatter                  Block
            allocation type
              -D                 nfs4                     File locking
            semantics in effect
              -k                 all                      ACL semantics
            in effect
              -n                 1000                     Estimated
            number of nodes that will mount file system
              -B                 1048576                  Block size
            (system pool)
                                 8388608                  Block size
            (other pools)
              -Q                 user;group;fileset       Quotas enforced
                                 user;group;fileset       Default quotas
            enabled
              --filesetdf        no                       Fileset df
            enabled?
              -V                 13.23 (3.5.0.7)          File system
            version
              --create-time      Tue Mar 18 16:01:24 2014 File system
            creation time
              -u                 yes                      Support for
            large LUNs?
              -z                 no                       Is DMAPI enabled?
              -L                 4194304                  Logfile size
              -E                 yes                      Exact mtime
            mount option
              -S                 yes                      Suppress atime
            mount option
              -K                 whenpossible             Strict replica
            allocation option
              --fastea           yes                      Fast external
            attributes enabled?
              --inode-limit      134217728                Maximum number
            of inodes
              -P                 system;data              Disk storage
            pools in file system
              -d
            gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
              -d
            gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
              -d
            gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
              -d
            gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
              -d
            gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3
            Disks in file system
              --perfileset-quota no                       Per-fileset
            quota enforcement
              -A                 yes                      Automatic
            mount option
              -o                 none                     Additional
            mount options
              -T                 /gpfs1                   Default mount
            point
              --mount-priority   0                        Mount priority


Regards,
Salvatore


On 14/10/14 17:22, Sven Oehme wrote:
> your GSS code version is very backlevel.
>
> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk
> as well as mmlsconfig and mmlsfs all
>
> thx. Sven
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
>
>
> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
> To: gpfsug-discuss at gpfsug.org
> Date: 10/14/2014 08:23 AM
> Subject: Re: [gpfsug-discuss] wait for permission to append to log
> Sent by: gpfsug-discuss-bounces at gpfsug.org
> ------------------------------------------------------------------------
>
>
>
>
> On 14/10/14 15:51, Sven Oehme wrote:
> it means there is contention on inserting data into the fast write log 
> on the GSS Node, which could be config or workload related
> what GSS code version are you running
> [root at ebi5-251 ~]# mmdiag --version
>
> === mmdiag: version ===
> Current GPFS build: "3.5.0-11 efix1 (888041)".
> Built on Jul  9 2013 at 18:03:32
> Running 6 days 2 hours 10 minutes 35 secs
>
>
>
> and how are the nodes connected with each other (Ethernet or IB) ?
> ethernet. they use the same bonding (4x10Gb/s) where the data is 
> passing. We don't have admin dedicated network
>
> [root at gss03a ~]# mmlscluster
>
> GPFS cluster information
> ========================
>   GPFS cluster name: GSS.ebi.ac.uk
>   GPFS cluster id: 17987981184946329605
>   GPFS UID domain: GSS.ebi.ac.uk
>   Remote shell command:      /usr/bin/ssh
>   Remote file copy command:  /usr/bin/scp
>
> GPFS cluster configuration servers:
> -----------------------------------
>   Primary server:    gss01a.ebi.ac.uk
>   Secondary server:  gss02b.ebi.ac.uk
>
>  Node  Daemon node name    IP address  Admin node name     Designation
> -----------------------------------------------------------------------
>    1   gss01a.ebi.ac.uk    10.7.28.2 gss01a.ebi.ac.uk    quorum-manager
>    2   gss01b.ebi.ac.uk    10.7.28.3 gss01b.ebi.ac.uk    quorum-manager
>    3   gss02a.ebi.ac.uk    10.7.28.67 gss02a.ebi.ac.uk    quorum-manager
>    4   gss02b.ebi.ac.uk    10.7.28.66 gss02b.ebi.ac.uk    quorum-manager
>    5   gss03a.ebi.ac.uk    10.7.28.34 gss03a.ebi.ac.uk    quorum-manager
>    6   gss03b.ebi.ac.uk    10.7.28.35 gss03b.ebi.ac.uk    quorum-manager
>
>
> *Note:* The 3 node "pairs" (gss01, gss02 and gss03)  are in different 
> subnet because of datacenter constraints ( They are not physically in 
> the same row, and due to network constraints was not possible to put 
> them in the same subnet). The packets are routed, but should not be a 
> problem as there is 160Gb/s bandwidth between them.
>
> Regards,
> Salvatore
>
>
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: _oehmes at us.ibm.com_ <mailto:oehmes at us.ibm.com>
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
>
>
> From: Salvatore Di Nardo _<sdinardo at ebi.ac.uk>_ 
> <mailto:sdinardo at ebi.ac.uk>
> To: gpfsug main discussion list _<gpfsug-discuss at gpfsug.org>_ 
> <mailto:gpfsug-discuss at gpfsug.org>
> Date: 10/14/2014 07:40 AM
> Subject: [gpfsug-discuss] wait for permission to append to log
> Sent by: _gpfsug-discuss-bounces at gpfsug.org_ 
> <mailto:gpfsug-discuss-bounces at gpfsug.org>
> ------------------------------------------------------------------------
>
>
>
> hello all,
> could someone explain me the meaning of those waiters?
>
> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>
> Does it means that the vdisk logs are struggling?
>
> Regards,
> Salvatore
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org_
> __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/349c39d3/attachment-0003.htm>
-------------- next part --------------

                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss01a                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       42%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub        4%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub        4%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       59%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s01                 2        DA3             110 GiB  ok                
 e1d1s02                 2        DA2             110 GiB  ok                
 e1d1s03log              2        LOG             186 GiB  ok                
 e1d1s04                 2        DA1             108 GiB  ok                
 e1d1s05                 2        DA2             110 GiB  ok                
 e1d1s06                 2        DA3             110 GiB  ok                
 e1d2s01                 2        DA1             108 GiB  ok                
 e1d2s02                 2        DA2             110 GiB  ok                
 e1d2s03                 2        DA3             110 GiB  ok                
 e1d2s04                 2        DA1             108 GiB  ok                
 e1d2s05                 2        DA2             110 GiB  ok                
 e1d2s06                 2        DA3             110 GiB  ok                
 e1d3s01                 2        DA1             108 GiB  ok                
 e1d3s02                 2        DA2             110 GiB  ok                
 e1d3s03                 2        DA3             110 GiB  ok                
 e1d3s04                 2        DA1             108 GiB  ok                
 e1d3s05                 2        DA2             110 GiB  ok                
 e1d3s06                 2        DA3             110 GiB  ok                
 e1d4s01                 2        DA1             108 GiB  ok                
 e1d4s02                 2        DA2             110 GiB  ok                
 e1d4s03                 2        DA3             110 GiB  ok                
 e1d4s04                 2        DA1             108 GiB  ok                
 e1d4s05                 2        DA2             110 GiB  ok                
 e1d4s06                 2        DA3             110 GiB  ok                
 e1d5s01                 2        DA1             108 GiB  ok                
 e1d5s02                 2        DA2             110 GiB  ok                
 e1d5s03                 2        DA3             110 GiB  ok                
 e1d5s04                 2        DA1             108 GiB  ok                
 e1d5s05                 2        DA2             110 GiB  ok                
 e1d5s06                 2        DA3             110 GiB  ok                
 e2d1s01                 2        DA3             110 GiB  ok                
 e2d1s02                 2        DA2             110 GiB  ok                
 e2d1s03log              2        LOG             186 GiB  ok                
 e2d1s04                 2        DA1             108 GiB  ok                
 e2d1s05                 2        DA2             110 GiB  ok                
 e2d1s06                 2        DA3             110 GiB  ok                
 e2d2s01                 2        DA1             108 GiB  ok                
 e2d2s02                 2        DA2             110 GiB  ok                
 e2d2s03                 2        DA3             110 GiB  ok                
 e2d2s04                 2        DA1             108 GiB  ok                
 e2d2s05                 2        DA2             110 GiB  ok                
 e2d2s06                 2        DA3             110 GiB  ok                
 e2d3s01                 2        DA1             108 GiB  ok                
 e2d3s02                 2        DA2             110 GiB  ok                
 e2d3s03                 2        DA3             110 GiB  ok                
 e2d3s04                 2        DA1             108 GiB  ok                
 e2d3s05                 2        DA2             110 GiB  ok                
 e2d3s06                 2        DA3             110 GiB  ok                
 e2d4s01                 2        DA1             108 GiB  ok                
 e2d4s02                 2        DA2             110 GiB  ok                
 e2d4s03                 2        DA3             110 GiB  ok                
 e2d4s04                 2        DA1             108 GiB  ok                
 e2d4s05                 2        DA2             110 GiB  ok                
 e2d4s06                 2        DA3             110 GiB  ok                
 e2d5s01                 2        DA1             108 GiB  ok                
 e2d5s02                 2        DA2             110 GiB  ok                
 e2d5s03                 2        DA3             110 GiB  ok                
 e2d5s04                 2        DA1             108 GiB  ok                
 e2d5s05                 2        DA2             110 GiB  ok                
 e2d5s06                 2        DA3             110 GiB  ok                
 e3d1s01                 2        DA1             108 GiB  ok                
 e3d1s02                 2        DA3             110 GiB  ok                
 e3d1s03log              2        LOG             186 GiB  ok                
 e3d1s04                 2        DA1             108 GiB  ok                
 e3d1s05                 2        DA2             110 GiB  ok                
 e3d1s06                 2        DA3             110 GiB  ok                
 e3d2s01                 2        DA1             108 GiB  ok                
 e3d2s02                 2        DA2             110 GiB  ok                
 e3d2s03                 2        DA3             110 GiB  ok                
 e3d2s04                 2        DA1             108 GiB  ok                
 e3d2s05                 2        DA2             110 GiB  ok                
 e3d2s06                 2        DA3             110 GiB  ok                
 e3d3s01                 2        DA1             108 GiB  ok                
 e3d3s02                 2        DA2             110 GiB  ok                
 e3d3s03                 2        DA3             110 GiB  ok                
 e3d3s04                 2        DA1             108 GiB  ok                
 e3d3s05                 2        DA2             110 GiB  ok                
 e3d3s06                 2        DA3             110 GiB  ok                
 e3d4s01                 2        DA1             108 GiB  ok                
 e3d4s02                 2        DA2             110 GiB  ok                
 e3d4s03                 2        DA3             110 GiB  ok                
 e3d4s04                 2        DA1             108 GiB  ok                
 e3d4s05                 2        DA2             110 GiB  ok                
 e3d4s06                 2        DA3             110 GiB  ok                
 e3d5s01                 2        DA1             108 GiB  ok                
 e3d5s02                 2        DA2             110 GiB  ok                
 e3d5s03                 2        DA3             110 GiB  ok                
 e3d5s04                 2        DA1             108 GiB  ok                
 e3d5s05                 2        DA2             110 GiB  ok                
 e3d5s06                 2        DA3             110 GiB  ok                
 e4d1s01                 2        DA1             108 GiB  ok                
 e4d1s02                 2        DA3             110 GiB  ok                
 e4d1s04                 2        DA1             108 GiB  ok                
 e4d1s05                 2        DA2             110 GiB  ok                
 e4d1s06                 2        DA3             110 GiB  ok                
 e4d2s01                 2        DA1             108 GiB  ok                
 e4d2s02                 2        DA2             110 GiB  ok                
 e4d2s03                 2        DA3             110 GiB  ok                
 e4d2s04                 2        DA1             106 GiB  ok                
 e4d2s05                 2        DA2             110 GiB  ok                
 e4d2s06                 2        DA3             110 GiB  ok                
 e4d3s01                 2        DA1             106 GiB  ok                
 e4d3s02                 2        DA2             110 GiB  ok                
 e4d3s03                 2        DA3             110 GiB  ok                
 e4d3s04                 2        DA1             106 GiB  ok                
 e4d3s05                 2        DA2             110 GiB  ok                
 e4d3s06                 2        DA3             110 GiB  ok                
 e4d4s01                 2        DA1             106 GiB  ok                
 e4d4s02                 2        DA2             110 GiB  ok                
 e4d4s03                 2        DA3             110 GiB  ok                
 e4d4s04                 2        DA1             106 GiB  ok                
 e4d4s05                 2        DA2             110 GiB  ok                
 e4d4s06                 2        DA3             110 GiB  ok                
 e4d5s01                 2        DA1             106 GiB  ok                
 e4d5s02                 2        DA2             110 GiB  ok                
 e4d5s03                 2        DA3             110 GiB  ok                
 e4d5s04                 2        DA1             106 GiB  ok                
 e4d5s05                 2        DA2             110 GiB  ok                
 e4d5s06                 2        DA3             110 GiB  ok                
 e5d1s01                 2        DA1             106 GiB  ok                
 e5d1s02                 2        DA2             110 GiB  ok                
 e5d1s04                 2        DA1             106 GiB  ok                
 e5d1s05                 2        DA2             110 GiB  ok                
 e5d1s06                 2        DA3             110 GiB  ok                
 e5d2s01                 2        DA1             106 GiB  ok                
 e5d2s02                 2        DA2             110 GiB  ok                
 e5d2s03                 2        DA3             110 GiB  ok                
 e5d2s04                 2        DA1             106 GiB  ok                
 e5d2s05                 2        DA2             110 GiB  ok                
 e5d2s06                 2        DA3             110 GiB  ok                
 e5d3s01                 2        DA1             106 GiB  ok                
 e5d3s02                 2        DA2             110 GiB  ok                
 e5d3s03                 2        DA3             110 GiB  ok                
 e5d3s04                 2        DA1             106 GiB  ok                
 e5d3s05                 2        DA2             110 GiB  ok                
 e5d3s06                 2        DA3             110 GiB  ok                
 e5d4s01                 2        DA1             106 GiB  ok                
 e5d4s02                 2        DA2             110 GiB  ok                
 e5d4s03                 2        DA3             110 GiB  ok                
 e5d4s04                 2        DA1             106 GiB  ok                
 e5d4s05                 2        DA2             110 GiB  ok                
 e5d4s06                 2        DA3             110 GiB  ok                
 e5d5s01                 2        DA1             106 GiB  ok                
 e5d5s02                 2        DA2             110 GiB  ok                
 e5d5s03                 2        DA3             110 GiB  ok                
 e5d5s04                 2        DA1             106 GiB  ok                
 e5d5s05                 2        DA2             110 GiB  ok                
 e5d5s06                 2        DA3             110 GiB  ok                
 e6d1s01                 2        DA1             106 GiB  ok                
 e6d1s02                 2        DA2             110 GiB  ok                
 e6d1s04                 2        DA1             106 GiB  ok                
 e6d1s05                 2        DA2             110 GiB  ok                
 e6d1s06                 2        DA3             110 GiB  ok                
 e6d2s01                 2        DA1             106 GiB  ok                
 e6d2s02                 2        DA2             110 GiB  ok                
 e6d2s03                 2        DA3             110 GiB  ok                
 e6d2s04                 2        DA1             106 GiB  ok                
 e6d2s05                 2        DA2             110 GiB  ok                
 e6d2s06                 2        DA3             110 GiB  ok                
 e6d3s01                 2        DA1             106 GiB  ok                
 e6d3s02                 2        DA2             110 GiB  ok                
 e6d3s03                 2        DA3             110 GiB  ok                
 e6d3s04                 2        DA1             106 GiB  ok                
 e6d3s05                 2        DA2             108 GiB  ok                
 e6d3s06                 2        DA3             108 GiB  ok                
 e6d4s01                 2        DA1             106 GiB  ok                
 e6d4s02                 2        DA2             108 GiB  ok                
 e6d4s03                 2        DA3             108 GiB  ok                
 e6d4s04                 2        DA1             106 GiB  ok                
 e6d4s05                 2        DA2             108 GiB  ok                
 e6d4s06                 2        DA3             108 GiB  ok                
 e6d5s01                 2        DA1             106 GiB  ok                
 e6d5s02                 2        DA2             108 GiB  ok                
 e6d5s03                 2        DA3             108 GiB  ok                
 e6d5s04                 2        DA1             106 GiB  ok                
 e6d5s05                 2        DA2             108 GiB  ok                
 e6d5s06                 2        DA3             108 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss01a_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss01a_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss01a_MetaData_8M_3p_1  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss01a_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss01a_MetaData_8M_3p_3  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss01a_Data_8M_3p_1  8+3p                DA3              99 TiB      8 MiB     32 KiB             
 gss01a_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss01a_Data_8M_3p_3  8+3p                DA1              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss01a.ebi.ac.uk                                 gss01a.ebi.ac.uk,gss01b.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss01b                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       36%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       61%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       68%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       70%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s07                 2        DA1             108 GiB  ok                
 e1d1s08                 2        DA2             110 GiB  ok                
 e1d1s09                 2        DA3             110 GiB  ok                
 e1d1s10                 2        DA1             108 GiB  ok                
 e1d1s11                 2        DA2             110 GiB  ok                
 e1d1s12                 2        DA3             110 GiB  ok                
 e1d2s07                 2        DA1             108 GiB  ok                
 e1d2s08                 2        DA2             110 GiB  ok                
 e1d2s09                 2        DA3             110 GiB  ok                
 e1d2s10                 2        DA1             108 GiB  ok                
 e1d2s11                 2        DA2             110 GiB  ok                
 e1d2s12                 2        DA3             110 GiB  ok                
 e1d3s07                 2        DA1             108 GiB  ok                
 e1d3s08                 2        DA2             110 GiB  ok                
 e1d3s09                 2        DA3             110 GiB  ok                
 e1d3s10                 2        DA1             108 GiB  ok                
 e1d3s11                 2        DA2             110 GiB  ok                
 e1d3s12                 2        DA3             110 GiB  ok                
 e1d4s07                 2        DA1             108 GiB  ok                
 e1d4s08                 2        DA2             110 GiB  ok                
 e1d4s09                 2        DA3             110 GiB  ok                
 e1d4s10                 2        DA1             108 GiB  ok                
 e1d4s11                 2        DA2             110 GiB  ok                
 e1d4s12                 2        DA3             110 GiB  ok                
 e1d5s07                 2        DA1             108 GiB  ok                
 e1d5s08                 2        DA2             110 GiB  ok                
 e1d5s09                 2        DA3             110 GiB  ok                
 e1d5s10                 2        DA3             110 GiB  ok                
 e1d5s11                 2        DA2             110 GiB  ok                
 e1d5s12log              2        LOG             186 GiB  ok                
 e2d1s07                 2        DA1             106 GiB  ok                
 e2d1s08                 2        DA2             110 GiB  ok                
 e2d1s09                 2        DA3             110 GiB  ok                
 e2d1s10                 2        DA1             108 GiB  ok                
 e2d1s11                 2        DA2             110 GiB  ok                
 e2d1s12                 2        DA3             110 GiB  ok                
 e2d2s07                 2        DA1             108 GiB  ok                
 e2d2s08                 2        DA2             110 GiB  ok                
 e2d2s09                 2        DA3             110 GiB  ok                
 e2d2s10                 2        DA1             108 GiB  ok                
 e2d2s11                 2        DA2             110 GiB  ok                
 e2d2s12                 2        DA3             110 GiB  ok                
 e2d3s07                 2        DA1             108 GiB  ok                
 e2d3s08                 2        DA2             110 GiB  ok                
 e2d3s09                 2        DA3             110 GiB  ok                
 e2d3s10                 2        DA1             108 GiB  ok                
 e2d3s11                 2        DA2             110 GiB  ok                
 e2d3s12                 2        DA3             110 GiB  ok                
 e2d4s07                 2        DA1             108 GiB  ok                
 e2d4s08                 2        DA2             110 GiB  ok                
 e2d4s09                 2        DA3             110 GiB  ok                
 e2d4s10                 2        DA1             108 GiB  ok                
 e2d4s11                 2        DA2             110 GiB  ok                
 e2d4s12                 2        DA3             110 GiB  ok                
 e2d5s07                 2        DA1             108 GiB  ok                
 e2d5s08                 2        DA2             110 GiB  ok                
 e2d5s09                 2        DA3             110 GiB  ok                
 e2d5s10                 2        DA3             110 GiB  ok                
 e2d5s11                 2        DA2             110 GiB  ok                
 e2d5s12log              2        LOG             186 GiB  ok                
 e3d1s07                 2        DA1             108 GiB  ok                
 e3d1s08                 2        DA2             110 GiB  ok                
 e3d1s09                 2        DA3             110 GiB  ok                
 e3d1s10                 2        DA1             108 GiB  ok                
 e3d1s11                 2        DA2             110 GiB  ok                
 e3d1s12                 2        DA3             110 GiB  ok                
 e3d2s07                 2        DA1             108 GiB  ok                
 e3d2s08                 2        DA2             110 GiB  ok                
 e3d2s09                 2        DA3             110 GiB  ok                
 e3d2s10                 2        DA1             108 GiB  ok                
 e3d2s11                 2        DA2             110 GiB  ok                
 e3d2s12                 2        DA3             110 GiB  ok                
 e3d3s07                 2        DA1             108 GiB  ok                
 e3d3s08                 2        DA2             110 GiB  ok                
 e3d3s09                 2        DA3             110 GiB  ok                
 e3d3s10                 2        DA1             108 GiB  ok                
 e3d3s11                 2        DA2             110 GiB  ok                
 e3d3s12                 2        DA3             110 GiB  ok                
 e3d4s07                 2        DA1             108 GiB  ok                
 e3d4s08                 2        DA2             110 GiB  ok                
 e3d4s09                 2        DA3             110 GiB  ok                
 e3d4s10                 2        DA1             108 GiB  ok                
 e3d4s11                 2        DA2             110 GiB  ok                
 e3d4s12                 2        DA3             110 GiB  ok                
 e3d5s07                 2        DA1             108 GiB  ok                
 e3d5s08                 2        DA2             110 GiB  ok                
 e3d5s09                 2        DA3             110 GiB  ok                
 e3d5s10                 2        DA1             108 GiB  ok                
 e3d5s11                 2        DA3             110 GiB  ok                
 e3d5s12log              2        LOG             186 GiB  ok                
 e4d1s07                 2        DA1             108 GiB  ok                
 e4d1s08                 2        DA2             110 GiB  ok                
 e4d1s09                 2        DA3             110 GiB  ok                
 e4d1s10                 2        DA1             108 GiB  ok                
 e4d1s11                 2        DA2             110 GiB  ok                
 e4d1s12                 2        DA3             110 GiB  ok                
 e4d2s07                 2        DA1             108 GiB  ok                
 e4d2s08                 2        DA2             110 GiB  ok                
 e4d2s09                 2        DA3             110 GiB  ok                
 e4d2s10                 2        DA1             108 GiB  ok                
 e4d2s11                 2        DA2             110 GiB  ok                
 e4d2s12                 2        DA3             110 GiB  ok                
 e4d3s07                 2        DA1             106 GiB  ok                
 e4d3s08                 2        DA2             110 GiB  ok                
 e4d3s09                 2        DA3             110 GiB  ok                
 e4d3s10                 2        DA1             106 GiB  ok                
 e4d3s11                 2        DA2             110 GiB  ok                
 e4d3s12                 2        DA3             110 GiB  ok                
 e4d4s07                 2        DA1             106 GiB  ok                
 e4d4s08                 2        DA2             110 GiB  ok                
 e4d4s09                 2        DA3             110 GiB  ok                
 e4d4s10                 2        DA1             106 GiB  ok                
 e4d4s11                 2        DA2             110 GiB  ok                
 e4d4s12                 2        DA3             110 GiB  ok                
 e4d5s07                 2        DA1             106 GiB  ok                
 e4d5s08                 2        DA2             110 GiB  ok                
 e4d5s09                 2        DA3             110 GiB  ok                
 e4d5s10                 2        DA1             106 GiB  ok                
 e4d5s11                 2        DA3             110 GiB  ok                
 e5d1s07                 2        DA1             106 GiB  ok                
 e5d1s08                 2        DA2             110 GiB  ok                
 e5d1s09                 2        DA3             110 GiB  ok                
 e5d1s10                 2        DA1             106 GiB  ok                
 e5d1s11                 2        DA2             110 GiB  ok                
 e5d1s12                 2        DA3             110 GiB  ok                
 e5d2s07                 2        DA1             106 GiB  ok                
 e5d2s08                 2        DA2             110 GiB  ok                
 e5d2s09                 2        DA3             110 GiB  ok                
 e5d2s10                 2        DA1             106 GiB  ok                
 e5d2s11                 2        DA2             110 GiB  ok                
 e5d2s12                 2        DA3             110 GiB  ok                
 e5d3s07                 2        DA1             106 GiB  ok                
 e5d3s08                 2        DA2             110 GiB  ok                
 e5d3s09                 2        DA3             110 GiB  ok                
 e5d3s10                 2        DA1             106 GiB  ok                
 e5d3s11                 2        DA2             110 GiB  ok                
 e5d3s12                 2        DA3             108 GiB  ok                
 e5d4s07                 2        DA1             106 GiB  ok                
 e5d4s08                 2        DA2             110 GiB  ok                
 e5d4s09                 2        DA3             110 GiB  ok                
 e5d4s10                 2        DA1             106 GiB  ok                
 e5d4s11                 2        DA2             110 GiB  ok                
 e5d4s12                 2        DA3             110 GiB  ok                
 e5d5s07                 2        DA1             106 GiB  ok                
 e5d5s08                 2        DA2             110 GiB  ok                
 e5d5s09                 2        DA3             110 GiB  ok                
 e5d5s10                 2        DA1             106 GiB  ok                
 e5d5s11                 2        DA2             110 GiB  ok                
 e6d1s07                 2        DA1             106 GiB  ok                
 e6d1s08                 2        DA2             110 GiB  ok                
 e6d1s09                 2        DA3             110 GiB  ok                
 e6d1s10                 2        DA1             106 GiB  ok                
 e6d1s11                 2        DA2             110 GiB  ok                
 e6d1s12                 2        DA3             110 GiB  ok                
 e6d2s07                 2        DA1             106 GiB  ok                
 e6d2s08                 2        DA2             110 GiB  ok                
 e6d2s09                 2        DA3             110 GiB  ok                
 e6d2s10                 2        DA1             106 GiB  ok                
 e6d2s11                 2        DA2             110 GiB  ok                
 e6d2s12                 2        DA3             110 GiB  ok                
 e6d3s07                 2        DA1             106 GiB  ok                
 e6d3s08                 2        DA2             108 GiB  ok                
 e6d3s09                 2        DA3             110 GiB  ok                
 e6d3s10                 2        DA1             106 GiB  ok                
 e6d3s11                 2        DA2             108 GiB  ok                
 e6d3s12                 2        DA3             108 GiB  ok                
 e6d4s07                 2        DA1             106 GiB  ok                
 e6d4s08                 2        DA2             108 GiB  ok                
 e6d4s09                 2        DA3             108 GiB  ok                
 e6d4s10                 2        DA1             106 GiB  ok                
 e6d4s11                 2        DA2             108 GiB  ok                
 e6d4s12                 2        DA3             108 GiB  ok                
 e6d5s07                 2        DA1             106 GiB  ok                
 e6d5s08                 2        DA2             110 GiB  ok                
 e6d5s09                 2        DA3             108 GiB  ok                
 e6d5s10                 2        DA1             106 GiB  ok                
 e6d5s11                 2        DA2             108 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss01b_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss01b_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss01b_MetaData_8M_3p_1  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss01b_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss01b_MetaData_8M_3p_3  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss01b_Data_8M_3p_1  8+3p                DA1              99 TiB      8 MiB     32 KiB             
 gss01b_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss01b_Data_8M_3p_3  8+3p                DA3              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss01b.ebi.ac.uk                                 gss01b.ebi.ac.uk,gss01a.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss02a                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       41%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub        8%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       14%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub        5%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s01                 2        DA3             110 GiB  ok                
 e1d1s02                 2        DA2             110 GiB  ok                
 e1d1s03log              2        LOG             186 GiB  ok                
 e1d1s04                 2        DA1             108 GiB  ok                
 e1d1s05                 2        DA2             110 GiB  ok                
 e1d1s06                 2        DA3             110 GiB  ok                
 e1d2s01                 2        DA1             108 GiB  ok                
 e1d2s02                 2        DA2             110 GiB  ok                
 e1d2s03                 2        DA3             110 GiB  ok                
 e1d2s04                 2        DA1             108 GiB  ok                
 e1d2s05                 2        DA2             110 GiB  ok                
 e1d2s06                 2        DA3             110 GiB  ok                
 e1d3s01                 2        DA1             108 GiB  ok                
 e1d3s02                 2        DA2             110 GiB  ok                
 e1d3s03                 2        DA3             110 GiB  ok                
 e1d3s04                 2        DA1             108 GiB  ok                
 e1d3s05                 2        DA2             110 GiB  ok                
 e1d3s06                 2        DA3             110 GiB  ok                
 e1d4s01                 2        DA1             108 GiB  ok                
 e1d4s02                 2        DA2             110 GiB  ok                
 e1d4s03                 2        DA3             110 GiB  ok                
 e1d4s04                 2        DA1             108 GiB  ok                
 e1d4s05                 2        DA2             110 GiB  ok                
 e1d4s06                 2        DA3             110 GiB  ok                
 e1d5s01                 2        DA1             108 GiB  ok                
 e1d5s02                 2        DA2             110 GiB  ok                
 e1d5s03                 2        DA3             110 GiB  ok                
 e1d5s04                 2        DA1             108 GiB  ok                
 e1d5s05                 2        DA2             110 GiB  ok                
 e1d5s06                 2        DA3             110 GiB  ok                
 e2d1s01                 2        DA3             110 GiB  ok                
 e2d1s02                 2        DA2             110 GiB  ok                
 e2d1s03log              2        LOG             186 GiB  ok                
 e2d1s04                 2        DA1             108 GiB  ok                
 e2d1s05                 2        DA2             110 GiB  ok                
 e2d1s06                 2        DA3             110 GiB  ok                
 e2d2s01                 2        DA1             106 GiB  ok                
 e2d2s02                 2        DA2             110 GiB  ok                
 e2d2s03                 2        DA3             110 GiB  ok                
 e2d2s04                 2        DA1             106 GiB  ok                
 e2d2s05                 2        DA2             110 GiB  ok                
 e2d2s06                 2        DA3             110 GiB  ok                
 e2d3s01                 2        DA1             106 GiB  ok                
 e2d3s02                 2        DA2             110 GiB  ok                
 e2d3s03                 2        DA3             110 GiB  ok                
 e2d3s04                 2        DA1             106 GiB  ok                
 e2d3s05                 2        DA2             110 GiB  ok                
 e2d3s06                 2        DA3             110 GiB  ok                
 e2d4s01                 2        DA1             106 GiB  ok                
 e2d4s02                 2        DA2             110 GiB  ok                
 e2d4s03                 2        DA3             110 GiB  ok                
 e2d4s04                 2        DA1             106 GiB  ok                
 e2d4s05                 2        DA2             110 GiB  ok                
 e2d4s06                 2        DA3             110 GiB  ok                
 e2d5s01                 2        DA1             108 GiB  ok                
 e2d5s02                 2        DA2             110 GiB  ok                
 e2d5s03                 2        DA3             110 GiB  ok                
 e2d5s04                 2        DA1             108 GiB  ok                
 e2d5s05                 2        DA2             110 GiB  ok                
 e2d5s06                 2        DA3             110 GiB  ok                
 e3d1s01                 2        DA1             108 GiB  ok                
 e3d1s02                 2        DA3             110 GiB  ok                
 e3d1s03log              2        LOG             186 GiB  ok                
 e3d1s04                 2        DA1             106 GiB  ok                
 e3d1s05                 2        DA2             110 GiB  ok                
 e3d1s06                 2        DA3             110 GiB  ok                
 e3d2s01                 2        DA1             106 GiB  ok                
 e3d2s02                 2        DA2             110 GiB  ok                
 e3d2s03                 2        DA3             110 GiB  ok                
 e3d2s04                 2        DA1             108 GiB  ok                
 e3d2s05                 2        DA2             110 GiB  ok                
 e3d2s06                 2        DA3             110 GiB  ok                
 e3d3s01                 2        DA1             106 GiB  ok                
 e3d3s02                 2        DA2             110 GiB  ok                
 e3d3s03                 2        DA3             110 GiB  ok                
 e3d3s04                 2        DA1             106 GiB  ok                
 e3d3s05                 2        DA2             110 GiB  ok                
 e3d3s06                 2        DA3             110 GiB  ok                
 e3d4s01                 2        DA1             106 GiB  ok                
 e3d4s02                 2        DA2             110 GiB  ok                
 e3d4s03                 2        DA3             110 GiB  ok                
 e3d4s04                 2        DA1             108 GiB  ok                
 e3d4s05                 2        DA2             110 GiB  ok                
 e3d4s06                 2        DA3             110 GiB  ok                
 e3d5s01                 2        DA1             108 GiB  ok                
 e3d5s02                 2        DA2             110 GiB  ok                
 e3d5s03                 2        DA3             110 GiB  ok                
 e3d5s04                 2        DA1             106 GiB  ok                
 e3d5s05                 2        DA2             110 GiB  ok                
 e3d5s06                 2        DA3             110 GiB  ok                
 e4d1s01                 2        DA1             106 GiB  ok                
 e4d1s02                 2        DA3             110 GiB  ok                
 e4d1s04                 2        DA1             106 GiB  ok                
 e4d1s05                 2        DA2             110 GiB  ok                
 e4d1s06                 2        DA3             110 GiB  ok                
 e4d2s01                 2        DA1             106 GiB  ok                
 e4d2s02                 2        DA2             110 GiB  ok                
 e4d2s03                 2        DA3             110 GiB  ok                
 e4d2s04                 2        DA1             106 GiB  ok                
 e4d2s05                 2        DA2             110 GiB  ok                
 e4d2s06                 2        DA3             110 GiB  ok                
 e4d3s01                 2        DA1             108 GiB  ok                
 e4d3s02                 2        DA2             110 GiB  ok                
 e4d3s03                 2        DA3             110 GiB  ok                
 e4d3s04                 2        DA1             108 GiB  ok                
 e4d3s05                 2        DA2             110 GiB  ok                
 e4d3s06                 2        DA3             110 GiB  ok                
 e4d4s01                 2        DA1             106 GiB  ok                
 e4d4s02                 2        DA2             110 GiB  ok                
 e4d4s03                 2        DA3             110 GiB  ok                
 e4d4s04                 2        DA1             106 GiB  ok                
 e4d4s05                 2        DA2             110 GiB  ok                
 e4d4s06                 2        DA3             110 GiB  ok                
 e4d5s01                 2        DA1             106 GiB  ok                
 e4d5s02                 2        DA2             110 GiB  ok                
 e4d5s03                 2        DA3             110 GiB  ok                
 e4d5s04                 2        DA1             106 GiB  ok                
 e4d5s05                 2        DA2             110 GiB  ok                
 e4d5s06                 2        DA3             110 GiB  ok                
 e5d1s01                 2        DA1             108 GiB  ok                
 e5d1s02                 2        DA2             110 GiB  ok                
 e5d1s04                 2        DA1             106 GiB  ok                
 e5d1s05                 2        DA2             110 GiB  ok                
 e5d1s06                 2        DA3             110 GiB  ok                
 e5d2s01                 2        DA1             108 GiB  ok                
 e5d2s02                 2        DA2             110 GiB  ok                
 e5d2s03                 2        DA3             110 GiB  ok                
 e5d2s04                 2        DA1             108 GiB  ok                
 e5d2s05                 2        DA2             110 GiB  ok                
 e5d2s06                 2        DA3             110 GiB  ok                
 e5d3s01                 2        DA1             108 GiB  ok                
 e5d3s02                 2        DA2             110 GiB  ok                
 e5d3s03                 2        DA3             110 GiB  ok                
 e5d3s04                 2        DA1             106 GiB  ok                
 e5d3s05                 2        DA2             110 GiB  ok                
 e5d3s06                 2        DA3             110 GiB  ok                
 e5d4s01                 2        DA1             108 GiB  ok                
 e5d4s02                 2        DA2             110 GiB  ok                
 e5d4s03                 2        DA3             110 GiB  ok                
 e5d4s04                 2        DA1             108 GiB  ok                
 e5d4s05                 2        DA2             110 GiB  ok                
 e5d4s06                 2        DA3             110 GiB  ok                
 e5d5s01                 2        DA1             108 GiB  ok                
 e5d5s02                 2        DA2             110 GiB  ok                
 e5d5s03                 2        DA3             110 GiB  ok                
 e5d5s04                 2        DA1             106 GiB  ok                
 e5d5s05                 2        DA2             110 GiB  ok                
 e5d5s06                 2        DA3             110 GiB  ok                
 e6d1s01                 2        DA1             108 GiB  ok                
 e6d1s02                 2        DA2             110 GiB  ok                
 e6d1s04                 2        DA1             108 GiB  ok                
 e6d1s05                 2        DA2             110 GiB  ok                
 e6d1s06                 2        DA3             110 GiB  ok                
 e6d2s01                 2        DA1             106 GiB  ok                
 e6d2s02                 2        DA2             110 GiB  ok                
 e6d2s03                 2        DA3             110 GiB  ok                
 e6d2s04                 2        DA1             108 GiB  ok                
 e6d2s05                 2        DA2             108 GiB  ok                
 e6d2s06                 2        DA3             110 GiB  ok                
 e6d3s01                 2        DA1             106 GiB  ok                
 e6d3s02                 2        DA2             108 GiB  ok                
 e6d3s03                 2        DA3             110 GiB  ok                
 e6d3s04                 2        DA1             106 GiB  ok                
 e6d3s05                 2        DA2             108 GiB  ok                
 e6d3s06                 2        DA3             108 GiB  ok                
 e6d4s01                 2        DA1             106 GiB  ok                
 e6d4s02                 2        DA2             108 GiB  ok                
 e6d4s03                 2        DA3             108 GiB  ok                
 e6d4s04                 2        DA1             108 GiB  ok                
 e6d4s05                 2        DA2             108 GiB  ok                
 e6d4s06                 2        DA3             108 GiB  ok                
 e6d5s01                 2        DA1             108 GiB  ok                
 e6d5s02                 2        DA2             110 GiB  ok                
 e6d5s03                 2        DA3             108 GiB  ok                
 e6d5s04                 2        DA1             108 GiB  ok                
 e6d5s05                 2        DA2             110 GiB  ok                
 e6d5s06                 2        DA3             108 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss02a_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss02a_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss02a_MetaData_8M_3p_1  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss02a_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss02a_MetaData_8M_3p_3  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss02a_Data_8M_3p_1  8+3p                DA3              99 TiB      8 MiB     32 KiB             
 gss02a_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss02a_Data_8M_3p_3  8+3p                DA1              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss02a.ebi.ac.uk                                 gss02a.ebi.ac.uk,gss02b.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss02b                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       39%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       67%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       13%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       13%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s07                 2        DA1             108 GiB  ok                
 e1d1s08                 2        DA2             110 GiB  ok                
 e1d1s09                 2        DA3             110 GiB  ok                
 e1d1s10                 2        DA1             108 GiB  ok                
 e1d1s11                 2        DA2             110 GiB  ok                
 e1d1s12                 2        DA3             110 GiB  ok                
 e1d2s07                 2        DA1             108 GiB  ok                
 e1d2s08                 2        DA2             110 GiB  ok                
 e1d2s09                 2        DA3             110 GiB  ok                
 e1d2s10                 2        DA1             108 GiB  ok                
 e1d2s11                 2        DA2             110 GiB  ok                
 e1d2s12                 2        DA3             110 GiB  ok                
 e1d3s07                 2        DA1             108 GiB  ok                
 e1d3s08                 2        DA2             110 GiB  ok                
 e1d3s09                 2        DA3             110 GiB  ok                
 e1d3s10                 2        DA1             108 GiB  ok                
 e1d3s11                 2        DA2             110 GiB  ok                
 e1d3s12                 2        DA3             110 GiB  ok                
 e1d4s07                 2        DA1             108 GiB  ok                
 e1d4s08                 2        DA2             110 GiB  ok                
 e1d4s09                 2        DA3             110 GiB  ok                
 e1d4s10                 2        DA1             108 GiB  ok                
 e1d4s11                 2        DA2             110 GiB  ok                
 e1d4s12                 2        DA3             110 GiB  ok                
 e1d5s07                 2        DA1             108 GiB  ok                
 e1d5s08                 2        DA2             110 GiB  ok                
 e1d5s09                 2        DA3             110 GiB  ok                
 e1d5s10                 2        DA3             110 GiB  ok                
 e1d5s11                 2        DA2             110 GiB  ok                
 e1d5s12log              2        LOG             186 GiB  ok                
 e2d1s07                 2        DA1             108 GiB  ok                
 e2d1s08                 2        DA2             110 GiB  ok                
 e2d1s09                 2        DA3             110 GiB  ok                
 e2d1s10                 2        DA1             108 GiB  ok                
 e2d1s11                 2        DA2             110 GiB  ok                
 e2d1s12                 2        DA3             110 GiB  ok                
 e2d2s07                 2        DA1             108 GiB  ok                
 e2d2s08                 2        DA2             110 GiB  ok                
 e2d2s09                 2        DA3             110 GiB  ok                
 e2d2s10                 2        DA1             108 GiB  ok                
 e2d2s11                 2        DA2             110 GiB  ok                
 e2d2s12                 2        DA3             110 GiB  ok                
 e2d3s07                 2        DA1             108 GiB  ok                
 e2d3s08                 2        DA2             110 GiB  ok                
 e2d3s09                 2        DA3             110 GiB  ok                
 e2d3s10                 2        DA1             108 GiB  ok                
 e2d3s11                 2        DA2             110 GiB  ok                
 e2d3s12                 2        DA3             110 GiB  ok                
 e2d4s07                 2        DA1             108 GiB  ok                
 e2d4s08                 2        DA2             110 GiB  ok                
 e2d4s09                 2        DA3             110 GiB  ok                
 e2d4s10                 2        DA1             108 GiB  ok                
 e2d4s11                 2        DA2             110 GiB  ok                
 e2d4s12                 2        DA3             110 GiB  ok                
 e2d5s07                 2        DA1             108 GiB  ok                
 e2d5s08                 2        DA2             110 GiB  ok                
 e2d5s09                 2        DA3             110 GiB  ok                
 e2d5s10                 2        DA3             110 GiB  ok                
 e2d5s11                 2        DA2             110 GiB  ok                
 e2d5s12log              2        LOG             186 GiB  ok                
 e3d1s07                 2        DA1             108 GiB  ok                
 e3d1s08                 2        DA2             110 GiB  ok                
 e3d1s09                 2        DA3             110 GiB  ok                
 e3d1s10                 2        DA1             108 GiB  ok                
 e3d1s11                 2        DA2             110 GiB  ok                
 e3d1s12                 2        DA3             110 GiB  ok                
 e3d2s07                 2        DA1             108 GiB  ok                
 e3d2s08                 2        DA2             110 GiB  ok                
 e3d2s09                 2        DA3             110 GiB  ok                
 e3d2s10                 2        DA1             108 GiB  ok                
 e3d2s11                 2        DA2             110 GiB  ok                
 e3d2s12                 2        DA3             110 GiB  ok                
 e3d3s07                 2        DA1             108 GiB  ok                
 e3d3s08                 2        DA2             110 GiB  ok                
 e3d3s09                 2        DA3             110 GiB  ok                
 e3d3s10                 2        DA1             108 GiB  ok                
 e3d3s11                 2        DA2             110 GiB  ok                
 e3d3s12                 2        DA3             110 GiB  ok                
 e3d4s07                 2        DA1             108 GiB  ok                
 e3d4s08                 2        DA2             110 GiB  ok                
 e3d4s09                 2        DA3             110 GiB  ok                
 e3d4s10                 2        DA1             108 GiB  ok                
 e3d4s11                 2        DA2             110 GiB  ok                
 e3d4s12                 2        DA3             110 GiB  ok                
 e3d5s07                 2        DA1             108 GiB  ok                
 e3d5s08                 2        DA2             110 GiB  ok                
 e3d5s09                 2        DA3             110 GiB  ok                
 e3d5s10                 2        DA1             108 GiB  ok                
 e3d5s11                 2        DA3             110 GiB  ok                
 e3d5s12log              2        LOG             186 GiB  ok                
 e4d1s07                 2        DA1             108 GiB  ok                
 e4d1s08                 2        DA2             110 GiB  ok                
 e4d1s09                 2        DA3             110 GiB  ok                
 e4d1s10                 2        DA1             108 GiB  ok                
 e4d1s11                 2        DA2             110 GiB  ok                
 e4d1s12                 2        DA3             110 GiB  ok                
 e4d2s07                 2        DA1             106 GiB  ok                
 e4d2s08                 2        DA2             110 GiB  ok                
 e4d2s09                 2        DA3             110 GiB  ok                
 e4d2s10                 2        DA1             106 GiB  ok                
 e4d2s11                 2        DA2             110 GiB  ok                
 e4d2s12                 2        DA3             110 GiB  ok                
 e4d3s07                 2        DA1             106 GiB  ok                
 e4d3s08                 2        DA2             110 GiB  ok                
 e4d3s09                 2        DA3             110 GiB  ok                
 e4d3s10                 2        DA1             106 GiB  ok                
 e4d3s11                 2        DA2             110 GiB  ok                
 e4d3s12                 2        DA3             110 GiB  ok                
 e4d4s07                 2        DA1             106 GiB  ok                
 e4d4s08                 2        DA2             110 GiB  ok                
 e4d4s09                 2        DA3             110 GiB  ok                
 e4d4s10                 2        DA1             108 GiB  ok                
 e4d4s11                 2        DA2             110 GiB  ok                
 e4d4s12                 2        DA3             110 GiB  ok                
 e4d5s07                 2        DA1             106 GiB  ok                
 e4d5s08                 2        DA2             110 GiB  ok                
 e4d5s09                 2        DA3             110 GiB  ok                
 e4d5s10                 2        DA1             106 GiB  ok                
 e4d5s11                 2        DA3             110 GiB  ok                
 e5d1s07                 2        DA1             106 GiB  ok                
 e5d1s08                 2        DA2             110 GiB  ok                
 e5d1s09                 2        DA3             110 GiB  ok                
 e5d1s10                 2        DA1             106 GiB  ok                
 e5d1s11                 2        DA2             110 GiB  ok                
 e5d1s12                 2        DA3             110 GiB  ok                
 e5d2s07                 2        DA1             106 GiB  ok                
 e5d2s08                 2        DA2             110 GiB  ok                
 e5d2s09                 2        DA3             110 GiB  ok                
 e5d2s10                 2        DA1             106 GiB  ok                
 e5d2s11                 2        DA2             110 GiB  ok                
 e5d2s12                 2        DA3             110 GiB  ok                
 e5d3s07                 2        DA1             106 GiB  ok                
 e5d3s08                 2        DA2             110 GiB  ok                
 e5d3s09                 2        DA3             110 GiB  ok                
 e5d3s10                 2        DA1             106 GiB  ok                
 e5d3s11                 2        DA2             110 GiB  ok                
 e5d3s12                 2        DA3             110 GiB  ok                
 e5d4s07                 2        DA1             106 GiB  ok                
 e5d4s08                 2        DA2             110 GiB  ok                
 e5d4s09                 2        DA3             110 GiB  ok                
 e5d4s10                 2        DA1             106 GiB  ok                
 e5d4s11                 2        DA2             110 GiB  ok                
 e5d4s12                 2        DA3             110 GiB  ok                
 e5d5s07                 2        DA1             106 GiB  ok                
 e5d5s08                 2        DA2             110 GiB  ok                
 e5d5s09                 2        DA3             110 GiB  ok                
 e5d5s10                 2        DA1             106 GiB  ok                
 e5d5s11                 2        DA2             110 GiB  ok                
 e6d1s07                 2        DA1             106 GiB  ok                
 e6d1s08                 2        DA2             110 GiB  ok                
 e6d1s09                 2        DA3             110 GiB  ok                
 e6d1s10                 2        DA1             106 GiB  ok                
 e6d1s11                 2        DA2             110 GiB  ok                
 e6d1s12                 2        DA3             110 GiB  ok                
 e6d2s07                 2        DA1             106 GiB  ok                
 e6d2s08                 2        DA2             110 GiB  ok                
 e6d2s09                 2        DA3             108 GiB  ok                
 e6d2s10                 2        DA1             106 GiB  ok                
 e6d2s11                 2        DA2             108 GiB  ok                
 e6d2s12                 2        DA3             108 GiB  ok                
 e6d3s07                 2        DA1             106 GiB  ok                
 e6d3s08                 2        DA2             108 GiB  ok                
 e6d3s09                 2        DA3             108 GiB  ok                
 e6d3s10                 2        DA1             106 GiB  ok                
 e6d3s11                 2        DA2             108 GiB  ok                
 e6d3s12                 2        DA3             108 GiB  ok                
 e6d4s07                 2        DA1             106 GiB  ok                
 e6d4s08                 2        DA2             108 GiB  ok                
 e6d4s09                 2        DA3             108 GiB  ok                
 e6d4s10                 2        DA1             106 GiB  ok                
 e6d4s11                 2        DA2             108 GiB  ok                
 e6d4s12                 2        DA3             110 GiB  ok                
 e6d5s07                 2        DA1             106 GiB  ok                
 e6d5s08                 2        DA2             110 GiB  ok                
 e6d5s09                 2        DA3             110 GiB  ok                
 e6d5s10                 2        DA1             106 GiB  ok                
 e6d5s11                 2        DA2             110 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss02b_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss02b_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss02b_MetaData_8M_3p_1  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss02b_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss02b_MetaData_8M_3p_3  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss02b_Data_8M_3p_1  8+3p                DA1              99 TiB      8 MiB     32 KiB             
 gss02b_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss02b_Data_8M_3p_3  8+3p                DA3              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss02b.ebi.ac.uk                                 gss02b.ebi.ac.uk,gss02a.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss03a                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       36%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       18%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       19%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub        4%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s01                 2        DA3             110 GiB  ok                
 e1d1s02                 2        DA2             110 GiB  ok                
 e1d1s03log              2        LOG             186 GiB  ok                
 e1d1s04                 2        DA1             108 GiB  ok                
 e1d1s05                 2        DA2             110 GiB  ok                
 e1d1s06                 2        DA3             110 GiB  ok                
 e1d2s01                 2        DA1             108 GiB  ok                
 e1d2s02                 2        DA2             110 GiB  ok                
 e1d2s03                 2        DA3             110 GiB  ok                
 e1d2s04                 2        DA1             108 GiB  ok                
 e1d2s05                 2        DA2             110 GiB  ok                
 e1d2s06                 2        DA3             110 GiB  ok                
 e1d3s01                 2        DA1             108 GiB  ok                
 e1d3s02                 2        DA2             110 GiB  ok                
 e1d3s03                 2        DA3             110 GiB  ok                
 e1d3s04                 2        DA1             108 GiB  ok                
 e1d3s05                 2        DA2             110 GiB  ok                
 e1d3s06                 2        DA3             110 GiB  ok                
 e1d4s01                 2        DA1             108 GiB  ok                
 e1d4s02                 2        DA2             110 GiB  ok                
 e1d4s03                 2        DA3             110 GiB  ok                
 e1d4s04                 2        DA1             108 GiB  ok                
 e1d4s05                 2        DA2             110 GiB  ok                
 e1d4s06                 2        DA3             110 GiB  ok                
 e1d5s01                 2        DA1             108 GiB  ok                
 e1d5s02                 2        DA2             110 GiB  ok                
 e1d5s03                 2        DA3             110 GiB  ok                
 e1d5s04                 2        DA1             108 GiB  ok                
 e1d5s05                 2        DA2             110 GiB  ok                
 e1d5s06                 2        DA3             110 GiB  ok                
 e2d1s01                 2        DA3             110 GiB  ok                
 e2d1s02                 2        DA2             110 GiB  ok                
 e2d1s03log              2        LOG             186 GiB  ok                
 e2d1s04                 2        DA1             108 GiB  ok                
 e2d1s05                 2        DA2             110 GiB  ok                
 e2d1s06                 2        DA3             110 GiB  ok                
 e2d2s01                 2        DA1             108 GiB  ok                
 e2d2s02                 2        DA2             110 GiB  ok                
 e2d2s03                 2        DA3             110 GiB  ok                
 e2d2s04                 2        DA1             108 GiB  ok                
 e2d2s05                 2        DA2             110 GiB  ok                
 e2d2s06                 2        DA3             110 GiB  ok                
 e2d3s01                 2        DA1             108 GiB  ok                
 e2d3s02                 2        DA2             110 GiB  ok                
 e2d3s03                 2        DA3             110 GiB  ok                
 e2d3s04                 2        DA1             108 GiB  ok                
 e2d3s05                 2        DA2             110 GiB  ok                
 e2d3s06                 2        DA3             110 GiB  ok                
 e2d4s01                 2        DA1             108 GiB  ok                
 e2d4s02                 2        DA2             110 GiB  ok                
 e2d4s03                 2        DA3             110 GiB  ok                
 e2d4s04                 2        DA1             108 GiB  ok                
 e2d4s05                 2        DA2             110 GiB  ok                
 e2d4s06                 2        DA3             110 GiB  ok                
 e2d5s01                 2        DA1             108 GiB  ok                
 e2d5s02                 2        DA2             110 GiB  ok                
 e2d5s03                 2        DA3             110 GiB  ok                
 e2d5s04                 2        DA1             108 GiB  ok                
 e2d5s05                 2        DA2             110 GiB  ok                
 e2d5s06                 2        DA3             110 GiB  ok                
 e3d1s01                 2        DA1             108 GiB  ok                
 e3d1s02                 2        DA3             110 GiB  ok                
 e3d1s03log              2        LOG             186 GiB  ok                
 e3d1s04                 2        DA1             108 GiB  ok                
 e3d1s05                 2        DA2             110 GiB  ok                
 e3d1s06                 2        DA3             110 GiB  ok                
 e3d2s01                 2        DA1             108 GiB  ok                
 e3d2s02                 2        DA2             110 GiB  ok                
 e3d2s03                 2        DA3             110 GiB  ok                
 e3d2s04                 2        DA1             108 GiB  ok                
 e3d2s05                 2        DA2             110 GiB  ok                
 e3d2s06                 2        DA3             110 GiB  ok                
 e3d3s01                 2        DA1             108 GiB  ok                
 e3d3s02                 2        DA2             110 GiB  ok                
 e3d3s03                 2        DA3             110 GiB  ok                
 e3d3s04                 2        DA1             108 GiB  ok                
 e3d3s05                 2        DA2             110 GiB  ok                
 e3d3s06                 2        DA3             110 GiB  ok                
 e3d4s01                 2        DA1             108 GiB  ok                
 e3d4s02                 2        DA2             110 GiB  ok                
 e3d4s03                 2        DA3             110 GiB  ok                
 e3d4s04                 2        DA1             108 GiB  ok                
 e3d4s05                 2        DA2             110 GiB  ok                
 e3d4s06                 2        DA3             110 GiB  ok                
 e3d5s01                 2        DA1             108 GiB  ok                
 e3d5s02                 2        DA2             110 GiB  ok                
 e3d5s03                 2        DA3             110 GiB  ok                
 e3d5s04                 2        DA1             108 GiB  ok                
 e3d5s05                 2        DA2             110 GiB  ok                
 e3d5s06                 2        DA3             110 GiB  ok                
 e4d1s01                 2        DA1             108 GiB  ok                
 e4d1s02                 2        DA3             110 GiB  ok                
 e4d1s04                 2        DA1             108 GiB  ok                
 e4d1s05                 2        DA2             110 GiB  ok                
 e4d1s06                 2        DA3             110 GiB  ok                
 e4d2s01                 2        DA1             108 GiB  ok                
 e4d2s02                 2        DA2             110 GiB  ok                
 e4d2s03                 2        DA3             110 GiB  ok                
 e4d2s04                 2        DA1             106 GiB  ok                
 e4d2s05                 2        DA2             110 GiB  ok                
 e4d2s06                 2        DA3             110 GiB  ok                
 e4d3s01                 2        DA1             106 GiB  ok                
 e4d3s02                 2        DA2             110 GiB  ok                
 e4d3s03                 2        DA3             110 GiB  ok                
 e4d3s04                 2        DA1             106 GiB  ok                
 e4d3s05                 2        DA2             110 GiB  ok                
 e4d3s06                 2        DA3             110 GiB  ok                
 e4d4s01                 2        DA1             106 GiB  ok                
 e4d4s02                 2        DA2             110 GiB  ok                
 e4d4s03                 2        DA3             110 GiB  ok                
 e4d4s04                 2        DA1             106 GiB  ok                
 e4d4s05                 2        DA2             110 GiB  ok                
 e4d4s06                 2        DA3             110 GiB  ok                
 e4d5s01                 2        DA1             106 GiB  ok                
 e4d5s02                 2        DA2             110 GiB  ok                
 e4d5s03                 2        DA3             110 GiB  ok                
 e4d5s04                 2        DA1             106 GiB  ok                
 e4d5s05                 2        DA2             110 GiB  ok                
 e4d5s06                 2        DA3             110 GiB  ok                
 e5d1s01                 2        DA1             106 GiB  ok                
 e5d1s02                 2        DA2             110 GiB  ok                
 e5d1s04                 2        DA1             106 GiB  ok                
 e5d1s05                 2        DA2             110 GiB  ok                
 e5d1s06                 2        DA3             110 GiB  ok                
 e5d2s01                 2        DA1             106 GiB  ok                
 e5d2s02                 2        DA2             110 GiB  ok                
 e5d2s03                 2        DA3             110 GiB  ok                
 e5d2s04                 2        DA1             106 GiB  ok                
 e5d2s05                 2        DA2             110 GiB  ok                
 e5d2s06                 2        DA3             110 GiB  ok                
 e5d3s01                 2        DA1             106 GiB  ok                
 e5d3s02                 2        DA2             110 GiB  ok                
 e5d3s03                 2        DA3             110 GiB  ok                
 e5d3s04                 2        DA1             106 GiB  ok                
 e5d3s05                 2        DA2             110 GiB  ok                
 e5d3s06                 2        DA3             110 GiB  ok                
 e5d4s01                 2        DA1             106 GiB  ok                
 e5d4s02                 2        DA2             110 GiB  ok                
 e5d4s03                 2        DA3             110 GiB  ok                
 e5d4s04                 2        DA1             106 GiB  ok                
 e5d4s05                 2        DA2             110 GiB  ok                
 e5d4s06                 2        DA3             110 GiB  ok                
 e5d5s01                 2        DA1             106 GiB  ok                
 e5d5s02                 2        DA2             110 GiB  ok                
 e5d5s03                 2        DA3             110 GiB  ok                
 e5d5s04                 2        DA1             106 GiB  ok                
 e5d5s05                 2        DA2             110 GiB  ok                
 e5d5s06                 2        DA3             110 GiB  ok                
 e6d1s01                 2        DA1             106 GiB  ok                
 e6d1s02                 2        DA2             110 GiB  ok                
 e6d1s04                 2        DA1             106 GiB  ok                
 e6d1s05                 2        DA2             110 GiB  ok                
 e6d1s06                 2        DA3             110 GiB  ok                
 e6d2s01                 2        DA1             106 GiB  ok                
 e6d2s02                 2        DA2             110 GiB  ok                
 e6d2s03                 2        DA3             110 GiB  ok                
 e6d2s04                 2        DA1             106 GiB  ok                
 e6d2s05                 2        DA2             108 GiB  ok                
 e6d2s06                 2        DA3             108 GiB  ok                
 e6d3s01                 2        DA1             106 GiB  ok                
 e6d3s02                 2        DA2             108 GiB  ok                
 e6d3s03                 2        DA3             108 GiB  ok                
 e6d3s04                 2        DA1             106 GiB  ok                
 e6d3s05                 2        DA2             108 GiB  ok                
 e6d3s06                 2        DA3             108 GiB  ok                
 e6d4s01                 2        DA1             106 GiB  ok                
 e6d4s02                 2        DA2             108 GiB  ok                
 e6d4s03                 2        DA3             108 GiB  ok                
 e6d4s04                 2        DA1             106 GiB  ok                
 e6d4s05                 2        DA2             108 GiB  ok                
 e6d4s06                 2        DA3             108 GiB  ok                
 e6d5s01                 2        DA1             106 GiB  ok                
 e6d5s02                 2        DA2             110 GiB  ok                
 e6d5s03                 2        DA3             110 GiB  ok                
 e6d5s04                 2        DA1             106 GiB  ok                
 e6d5s05                 2        DA2             110 GiB  ok                
 e6d5s06                 2        DA3             110 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss03a_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss03a_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss03a_MetaData_8M_3p_1  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss03a_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss03a_MetaData_8M_3p_3  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss03a_Data_8M_3p_1  8+3p                DA3              99 TiB      8 MiB     32 KiB             
 gss03a_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss03a_Data_8M_3p_3  8+3p                DA1              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss03a.ebi.ac.uk                                 gss03a.ebi.ac.uk,gss03b.ebi.ac.uk


                    declustered
 recovery group       arrays     vdisks  pdisks  format version
 -----------------  -----------  ------  ------  --------------
 gss03b                       4       8     177  3.5.0.5

 declustered   needs                            replace                scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------
 LOG          no            1       3       0          1     558 GiB   14 days  scrub       38%  low   
 DA1          no            3      58       2          1     626 GiB   14 days  scrub       12%  low   
 DA2          no            2      58       2          1     786 GiB   14 days  scrub       20%  low   
 DA3          no            2      58       2          1     786 GiB   14 days  scrub       19%  low   

                     number of    declustered
 pdisk              active paths     array     free space  state
 -----------------  ------------  -----------  ----------  -----
 e1d1s07                 2        DA1             108 GiB  ok                
 e1d1s08                 2        DA2             110 GiB  ok                
 e1d1s09                 2        DA3             110 GiB  ok                
 e1d1s10                 2        DA1             108 GiB  ok                
 e1d1s11                 2        DA2             110 GiB  ok                
 e1d1s12                 2        DA3             110 GiB  ok                
 e1d2s07                 2        DA1             108 GiB  ok                
 e1d2s08                 2        DA2             110 GiB  ok                
 e1d2s09                 2        DA3             110 GiB  ok                
 e1d2s10                 2        DA1             108 GiB  ok                
 e1d2s11                 2        DA2             110 GiB  ok                
 e1d2s12                 2        DA3             110 GiB  ok                
 e1d3s07                 2        DA1             108 GiB  ok                
 e1d3s08                 2        DA2             110 GiB  ok                
 e1d3s09                 2        DA3             110 GiB  ok                
 e1d3s10                 2        DA1             108 GiB  ok                
 e1d3s11                 2        DA2             110 GiB  ok                
 e1d3s12                 2        DA3             110 GiB  ok                
 e1d4s07                 2        DA1             108 GiB  ok                
 e1d4s08                 2        DA2             110 GiB  ok                
 e1d4s09                 2        DA3             110 GiB  ok                
 e1d4s10                 2        DA1             108 GiB  ok                
 e1d4s11                 2        DA2             110 GiB  ok                
 e1d4s12                 2        DA3             110 GiB  ok                
 e1d5s07                 2        DA1             108 GiB  ok                
 e1d5s08                 2        DA2             110 GiB  ok                
 e1d5s09                 2        DA3             110 GiB  ok                
 e1d5s10                 2        DA3             110 GiB  ok                
 e1d5s11                 2        DA2             110 GiB  ok                
 e1d5s12log              2        LOG             186 GiB  ok                
 e2d1s07                 2        DA1             108 GiB  ok                
 e2d1s08                 2        DA2             110 GiB  ok                
 e2d1s09                 2        DA3             110 GiB  ok                
 e2d1s10                 2        DA1             106 GiB  ok                
 e2d1s11                 2        DA2             110 GiB  ok                
 e2d1s12                 2        DA3             110 GiB  ok                
 e2d2s07                 2        DA1             106 GiB  ok                
 e2d2s08                 2        DA2             110 GiB  ok                
 e2d2s09                 2        DA3             110 GiB  ok                
 e2d2s10                 2        DA1             106 GiB  ok                
 e2d2s11                 2        DA2             110 GiB  ok                
 e2d2s12                 2        DA3             110 GiB  ok                
 e2d3s07                 2        DA1             106 GiB  ok                
 e2d3s08                 2        DA2             110 GiB  ok                
 e2d3s09                 2        DA3             110 GiB  ok                
 e2d3s10                 2        DA1             106 GiB  ok                
 e2d3s11                 2        DA2             110 GiB  ok                
 e2d3s12                 2        DA3             110 GiB  ok                
 e2d4s07                 2        DA1             106 GiB  ok                
 e2d4s08                 2        DA2             110 GiB  ok                
 e2d4s09                 2        DA3             110 GiB  ok                
 e2d4s10                 2        DA1             108 GiB  ok                
 e2d4s11                 2        DA2             110 GiB  ok                
 e2d4s12                 2        DA3             110 GiB  ok                
 e2d5s07                 2        DA1             108 GiB  ok                
 e2d5s08                 2        DA2             110 GiB  ok                
 e2d5s09                 2        DA3             110 GiB  ok                
 e2d5s10                 2        DA3             110 GiB  ok                
 e2d5s11                 2        DA2             110 GiB  ok                
 e2d5s12log              2        LOG             186 GiB  ok                
 e3d1s07                 2        DA1             108 GiB  ok                
 e3d1s08                 2        DA2             110 GiB  ok                
 e3d1s09                 2        DA3             110 GiB  ok                
 e3d1s10                 2        DA1             106 GiB  ok                
 e3d1s11                 2        DA2             110 GiB  ok                
 e3d1s12                 2        DA3             110 GiB  ok                
 e3d2s07                 2        DA1             106 GiB  ok                
 e3d2s08                 2        DA2             110 GiB  ok                
 e3d2s09                 2        DA3             110 GiB  ok                
 e3d2s10                 2        DA1             108 GiB  ok                
 e3d2s11                 2        DA2             110 GiB  ok                
 e3d2s12                 2        DA3             110 GiB  ok                
 e3d3s07                 2        DA1             106 GiB  ok                
 e3d3s08                 2        DA2             110 GiB  ok                
 e3d3s09                 2        DA3             110 GiB  ok                
 e3d3s10                 2        DA1             106 GiB  ok                
 e3d3s11                 2        DA2             110 GiB  ok                
 e3d3s12                 2        DA3             110 GiB  ok                
 e3d4s07                 2        DA1             106 GiB  ok                
 e3d4s08                 2        DA2             110 GiB  ok                
 e3d4s09                 2        DA3             110 GiB  ok                
 e3d4s10                 2        DA1             108 GiB  ok                
 e3d4s11                 2        DA2             110 GiB  ok                
 e3d4s12                 2        DA3             110 GiB  ok                
 e3d5s07                 2        DA1             108 GiB  ok                
 e3d5s08                 2        DA2             110 GiB  ok                
 e3d5s09                 2        DA3             110 GiB  ok                
 e3d5s10                 2        DA1             106 GiB  ok                
 e3d5s11                 2        DA3             110 GiB  ok                
 e3d5s12log              2        LOG             186 GiB  ok                
 e4d1s07                 2        DA1             106 GiB  ok                
 e4d1s08                 2        DA2             110 GiB  ok                
 e4d1s09                 2        DA3             110 GiB  ok                
 e4d1s10                 2        DA1             106 GiB  ok                
 e4d1s11                 2        DA2             110 GiB  ok                
 e4d1s12                 2        DA3             110 GiB  ok                
 e4d2s07                 2        DA1             106 GiB  ok                
 e4d2s08                 2        DA2             110 GiB  ok                
 e4d2s09                 2        DA3             110 GiB  ok                
 e4d2s10                 2        DA1             106 GiB  ok                
 e4d2s11                 2        DA2             110 GiB  ok                
 e4d2s12                 2        DA3             110 GiB  ok                
 e4d3s07                 2        DA1             108 GiB  ok                
 e4d3s08                 2        DA2             110 GiB  ok                
 e4d3s09                 2        DA3             110 GiB  ok                
 e4d3s10                 2        DA1             108 GiB  ok                
 e4d3s11                 2        DA2             110 GiB  ok                
 e4d3s12                 2        DA3             110 GiB  ok                
 e4d4s07                 2        DA1             106 GiB  ok                
 e4d4s08                 2        DA2             110 GiB  ok                
 e4d4s09                 2        DA3             110 GiB  ok                
 e4d4s10                 2        DA1             106 GiB  ok                
 e4d4s11                 2        DA2             110 GiB  ok                
 e4d4s12                 2        DA3             110 GiB  ok                
 e4d5s07                 2        DA1             106 GiB  ok                
 e4d5s08                 2        DA2             110 GiB  ok                
 e4d5s09                 2        DA3             110 GiB  ok                
 e4d5s10                 2        DA1             106 GiB  ok                
 e4d5s11                 2        DA3             110 GiB  ok                
 e5d1s07                 2        DA1             108 GiB  ok                
 e5d1s08                 2        DA2             110 GiB  ok                
 e5d1s09                 2        DA3             110 GiB  ok                
 e5d1s10                 2        DA1             106 GiB  ok                
 e5d1s11                 2        DA2             110 GiB  ok                
 e5d1s12                 2        DA3             110 GiB  ok                
 e5d2s07                 2        DA1             108 GiB  ok                
 e5d2s08                 2        DA2             110 GiB  ok                
 e5d2s09                 2        DA3             110 GiB  ok                
 e5d2s10                 2        DA1             108 GiB  ok                
 e5d2s11                 2        DA2             110 GiB  ok                
 e5d2s12                 2        DA3             110 GiB  ok                
 e5d3s07                 2        DA1             108 GiB  ok                
 e5d3s08                 2        DA2             110 GiB  ok                
 e5d3s09                 2        DA3             110 GiB  ok                
 e5d3s10                 2        DA1             106 GiB  ok                
 e5d3s11                 2        DA2             110 GiB  ok                
 e5d3s12                 2        DA3             110 GiB  ok                
 e5d4s07                 2        DA1             108 GiB  ok                
 e5d4s08                 2        DA2             110 GiB  ok                
 e5d4s09                 2        DA3             110 GiB  ok                
 e5d4s10                 2        DA1             108 GiB  ok                
 e5d4s11                 2        DA2             110 GiB  ok                
 e5d4s12                 2        DA3             110 GiB  ok                
 e5d5s07                 2        DA1             108 GiB  ok                
 e5d5s08                 2        DA2             110 GiB  ok                
 e5d5s09                 2        DA3             110 GiB  ok                
 e5d5s10                 2        DA1             106 GiB  ok                
 e5d5s11                 2        DA2             110 GiB  ok                
 e6d1s07                 2        DA1             108 GiB  ok                
 e6d1s08                 2        DA2             110 GiB  ok                
 e6d1s09                 2        DA3             110 GiB  ok                
 e6d1s10                 2        DA1             108 GiB  ok                
 e6d1s11                 2        DA2             110 GiB  ok                
 e6d1s12                 2        DA3             110 GiB  ok                
 e6d2s07                 2        DA1             106 GiB  ok                
 e6d2s08                 2        DA2             110 GiB  ok                
 e6d2s09                 2        DA3             108 GiB  ok                
 e6d2s10                 2        DA1             108 GiB  ok                
 e6d2s11                 2        DA2             108 GiB  ok                
 e6d2s12                 2        DA3             108 GiB  ok                
 e6d3s07                 2        DA1             106 GiB  ok                
 e6d3s08                 2        DA2             108 GiB  ok                
 e6d3s09                 2        DA3             108 GiB  ok                
 e6d3s10                 2        DA1             106 GiB  ok                
 e6d3s11                 2        DA2             108 GiB  ok                
 e6d3s12                 2        DA3             108 GiB  ok                
 e6d4s07                 2        DA1             106 GiB  ok                
 e6d4s08                 2        DA2             108 GiB  ok                
 e6d4s09                 2        DA3             108 GiB  ok                
 e6d4s10                 2        DA1             108 GiB  ok                
 e6d4s11                 2        DA2             108 GiB  ok                
 e6d4s12                 2        DA3             110 GiB  ok                
 e6d5s07                 2        DA1             108 GiB  ok                
 e6d5s08                 2        DA2             110 GiB  ok                
 e6d5s09                 2        DA3             110 GiB  ok                
 e6d5s10                 2        DA1             108 GiB  ok                
 e6d5s11                 2        DA2             110 GiB  ok                

                                         declustered                           checksum
 vdisk               RAID code              array     vdisk size  block size  granularity  remarks
 ------------------  ------------------  -----------  ----------  ----------  -----------  -------
 gss03b_logtip       3WayReplication     LOG             128 MiB      1 MiB       512      logTip  
 gss03b_loghome      4WayReplication     DA1              40 GiB      1 MiB       512      log     
 gss03b_MetaData_8M_3p_1  3WayReplication     DA1            5121 GiB      1 MiB     32 KiB             
 gss03b_MetaData_8M_3p_2  3WayReplication     DA2            5121 GiB      1 MiB     32 KiB             
 gss03b_MetaData_8M_3p_3  3WayReplication     DA3            5121 GiB      1 MiB     32 KiB             
 gss03b_Data_8M_3p_1  8+3p                DA1              99 TiB      8 MiB     32 KiB             
 gss03b_Data_8M_3p_2  8+3p                DA2              99 TiB      8 MiB     32 KiB             
 gss03b_Data_8M_3p_3  8+3p                DA3              99 TiB      8 MiB     32 KiB             

 active recovery group server                     servers
 -----------------------------------------------  -------
 gss03b.ebi.ac.uk                                 gss03b.ebi.ac.uk,gss03a.ebi.ac.uk

-------------- next part --------------

=== mmdiag: config ===
   allowDeleteAclOnChmod 1
   assertOnStructureError 0
   atimeDeferredSeconds 86400
 ! cipherList AUTHONLY
 ! clusterId 17987981184946329605
 ! clusterName GSS.ebi.ac.uk
   consoleLogEvents 0
   dataStructureDump 1 /tmp/mmfs
   dataStructureDumpOnRGOpenFailed 0 /tmp/mmfs
   dataStructureDumpOnSGPanic 0 /tmp/mmfs
   dataStructureDumpWait 60
   dbBlockSizeThreshold -1
   distributedTokenServer 1
   dmapiAllowMountOnWindows 1
   dmapiDataEventRetry 2
   dmapiEnable 1
   dmapiEventBuffers 64
   dmapiEventTimeout -1
 ! dmapiFileHandleSize 32
   dmapiMountEvent all
   dmapiMountTimeout 60
   dmapiSessionFailureTimeout 0
   dmapiWorkerThreads 12
   enableIPv6 0
   enableLowspaceEvents 0
   enableNFSCluster 0
   enableStatUIDremap 0
   enableTreeBasedQuotas 0
   enableUIDremap 0
   encryptionCryptoEngineLibName (NULL)
   encryptionCryptoEngineType CLiC
   enforceFilesetQuotaOnRoot 0
   envVar 
 ! failureDetectionTime 60
   fgdlActivityTimeWindow 10
   fgdlLeaveThreshold 1000
   fineGrainDirLocks 1
   FIPS1402mode 0
   FleaDisableIntegrityChecks 0
   FleaNumAsyncIOThreads 2
   FleaNumLEBBuffers 256
   FleaPreferredStripSize 0
 ! flushedDataTarget 1024
 ! flushedInodeTarget 1024
   healthCheckInterval 10
   idleSocketTimeout 3600
   ignorePrefetchLUNCount 0
   ignoreReplicaSpaceOnStat 0
   ignoreReplicationForQuota 0
   ignoreReplicationOnStatfs 0
 ! ioHistorySize 65536
   iscanPrefetchAggressiveness 2
   leaseDMSTimeout -1
   leaseDuration -1
   leaseRecoveryWait 35
 ! logBufferCount 20
 ! logWrapAmountPct 2
 ! logWrapThreads 128
   lrocChecksum 0
   lrocData 1
   lrocDataMaxBufferSize 32768
   lrocDataMaxFileSize 32768
   lrocDataStubFileSize 0
   lrocDeviceMaxSectorsKB 64
   lrocDeviceNrRequests 1024
   lrocDeviceQueueDepth 31
   lrocDevices 
   lrocDeviceScheduler deadline
   lrocDeviceSetParams 1
   lrocDirectories 1
   lrocInodes 1
 ! maxAllocRegionsPerNode 32
 ! maxBackgroundDeletionThreads 16
 ! maxblocksize 16777216
 ! maxBufferCleaners 1024
 ! maxBufferDescs 2097152
   maxDiskAddrBuffs -1
   maxFcntlRangesPerFile 200
 ! maxFileCleaners 1024
   maxFileNameBytes 255
 ! maxFilesToCache 12288
 ! maxGeneralThreads 1280
 ! maxInodeDeallocPrefetch 128
 ! maxMBpS 16000
   maxMissedPingTimeout 60
 ! maxReceiverThreads 128
 ! maxStatCache 512
   maxTokenServers 128
   minMissedPingTimeout 3
   minQuorumNodes 1
 ! minReleaseLevel 1340
 ! myNodeConfigNumber 5
   noSpaceEventInterval 120
   nsdBufSpace (% of PagePool)  30
 ! nsdClientCksumTypeLocal NsdCksum_Ck64
 ! nsdClientCksumTypeRemote NsdCksum_Ck64
   nsdDumpBuffersOnCksumError 0 nsd_cksum_capture
 ! nsdInlineWriteMax 32768
 ! nsdMaxWorkerThreads 3072
 ! nsdMinWorkerThreads 3072
   nsdMultiQueue 256
   nsdRAIDAllowTraditionalNSD 0
   nsdRAIDAULogColocationLimit 131072
   nsdRAIDBackgroundMinPct 5
 ! nsdRAIDBlockDeviceMaxSectorsKB 4096
 ! nsdRAIDBlockDeviceNrRequests 32
 ! nsdRAIDBlockDeviceQueueDepth 16
 ! nsdRAIDBlockDeviceScheduler deadline
 ! nsdRAIDBufferPoolSizePct (% of PagePool) 80
   nsdRAIDBuffersPromotionThresholdPct 50
   nsdRAIDCreateVdiskThreads 8
   nsdRAIDDiskDiscoveryInterval 180
 ! nsdRAIDEventLogToConsole all
 ! nsdRAIDFastWriteFSDataLimit 65536
 ! nsdRAIDFastWriteFSMetadataLimit 262144
 ! nsdRAIDFlusherBuffersLimitPct 80
 ! nsdRAIDFlusherBuffersLowWatermarkPct 20
 ! nsdRAIDFlusherFWLogHighWatermarkMB 1000
 ! nsdRAIDFlusherFWLogLimitMB 5000
 ! nsdRAIDFlusherThreadsHighWatermark 512
 ! nsdRAIDFlusherThreadsLowWatermark 1
 ! nsdRAIDFlusherTracksLimitPct 80
 ! nsdRAIDFlusherTracksLowWatermarkPct 20
   nsdRAIDForegroundMinPct 15
 ! nsdRAIDMaxTransientStale2FT 1
 ! nsdRAIDMaxTransientStale3FT 1
   nsdRAIDMediumWriteLimitPct 50
   nsdRAIDMultiQueue -1
 ! nsdRAIDReconstructAggressiveness 1
 ! nsdRAIDSmallBufferSize 262144
 ! nsdRAIDSmallThreadRatio 2
 ! nsdRAIDThreadsPerQueue 16
 ! nsdRAIDTracks 131072
 ! numaMemoryInterleave yes
   opensslLibName /usr/lib64/libssl.so.10:/usr/lib64/libssl.so.6:/usr/lib64/libssl.so.0.9.8:/lib64/libssl.so.6:libssl.so:libssl.so.0:libssl.so.4
 ! pagepool 40802189312
   pagepoolMaxPhysMemPct 75
   prefetchAggressiveness 2
   prefetchAggressivenessRead -1
   prefetchAggressivenessWrite -1
 ! prefetchPct 5
   prefetchThreads 72
   readReplicaPolicy default
   remoteMountTimeout 10
   sharedMemLimit 0
   sharedMemReservePct 15
   sidAutoMapRangeLength 15000000
   sidAutoMapRangeStart 15000000
 ! socketMaxListenConnections 1500
   socketRcvBufferSize 0
   socketSndBufferSize 0
   statCacheDirPct 10
   subnets 
 ! syncWorkerThreads 256
   tiebreaker system
   tiebreakerDisks 
   tokenMemLimit 536870912
   treatOSyncLikeODSync 1
   tscTcpPort 1191
 ! tscWorkerPool 64
   uidDomain GSS.ebi.ac.uk
   uidExpiration 36000
   unmountOnDiskFail no
   useDIOXW 1
   usePersistentReserve 0
   verbsLibName libibverbs.so
   verbsPorts 
   verbsRdma disable
   verbsRdmaCm disable
   verbsRdmaCmLibName librdmacm.so
   verbsRdmaMaxSendBytes 16777216
   verbsRdmaMinBytes 8192
   verbsRdmaQpRtrMinRnrTimer 18
   verbsRdmaQpRtrPathMtu 2048
   verbsRdmaQpRtrSl 0
   verbsRdmaQpRtrSlDynamic 0
   verbsRdmaQpRtrSlDynamicTimeout 10
   verbsRdmaQpRtsRetryCnt 6
   verbsRdmaQpRtsRnrRetry 6
   verbsRdmaQpRtsTimeout 18
   verbsRdmaSend 0
   verbsRdmasPerConnection 8
   verbsRdmasPerNode 0
   verbsRdmaTimeout 18
   verifyGpfsReady 0
 ! worker1Threads 1024
 ! worker3Threads 32
   writebehindThreshold 524288

From oehmes at us.ibm.com  Tue Oct 14 18:23:50 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 10:23:50 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <543D51B6.3070602@ebi.ac.uk>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
	<543D51B6.3070602@ebi.ac.uk>
Message-ID: <OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>

you basically run GSS 1.0 code , while in the current version is GSS 2.0 
(which replaced Version 1.5 2 month ago) 

GSS 1.5 and 2.0 have several enhancements in this space so i strongly 
encourage you to upgrade your systems. 

if you can specify a bit what your workload is there might also be 
additional knobs we can turn to change the behavior. 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------

gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM:

> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
> Date: 10/14/2014 09:40 AM
> Subject: Re: [gpfsug-discuss] wait for permission to append to log
> Sent by: gpfsug-discuss-bounces at gpfsug.org
> 
> Thanks in advance for your help.
> 
> We have 6 RG:

>  recovery group        vdisks     vdisks  servers
>  ------------------  -----------  ------  -------
>  gss01a                        4       8  
gss01a.ebi.ac.uk,gss01b.ebi.ac.uk 
>  gss01b                        4       8  
gss01b.ebi.ac.uk,gss01a.ebi.ac.uk 
>  gss02a                        4       8  
gss02a.ebi.ac.uk,gss02b.ebi.ac.uk 
>  gss02b                        4       8  
gss02b.ebi.ac.uk,gss02a.ebi.ac.uk 
>  gss03a                        4       8  
gss03a.ebi.ac.uk,gss03b.ebi.ac.uk 
>  gss03b                        4       8  
gss03b.ebi.ac.uk,gss03a.ebi.ac.uk 
> 
> Check the attached file for RG details. 
> Following mmlsconfig:

> [root at gss01a ~]# mmlsconfig
> Configuration data for cluster GSS.ebi.ac.uk:
> ---------------------------------------------
> myNodeConfigNumber 1
> clusterName GSS.ebi.ac.uk
> clusterId 17987981184946329605
> autoload no
> dmapiFileHandleSize 32
> minReleaseLevel 3.5.0.11
> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
> pagepool 38g
> nsdRAIDBufferPoolSizePct 80
> maxBufferDescs 2m
> numaMemoryInterleave yes
> prefetchPct 5
> maxblocksize 16m
> nsdRAIDTracks 128k
> ioHistorySize 64k
> nsdRAIDSmallBufferSize 256k
> nsdMaxWorkerThreads 3k
> nsdMinWorkerThreads 3k
> nsdRAIDSmallThreadRatio 2
> nsdRAIDThreadsPerQueue 16
> nsdClientCksumTypeLocal ck64
> nsdClientCksumTypeRemote ck64
> nsdRAIDEventLogToConsole all
> nsdRAIDFastWriteFSDataLimit 64k
> nsdRAIDFastWriteFSMetadataLimit 256k
> nsdRAIDReconstructAggressiveness 1
> nsdRAIDFlusherBuffersLowWatermarkPct 20
> nsdRAIDFlusherBuffersLimitPct 80
> nsdRAIDFlusherTracksLowWatermarkPct 20
> nsdRAIDFlusherTracksLimitPct 80
> nsdRAIDFlusherFWLogHighWatermarkMB 1000
> nsdRAIDFlusherFWLogLimitMB 5000
> nsdRAIDFlusherThreadsLowWatermark 1
> nsdRAIDFlusherThreadsHighWatermark 512
> nsdRAIDBlockDeviceMaxSectorsKB 4096
> nsdRAIDBlockDeviceNrRequests 32
> nsdRAIDBlockDeviceQueueDepth 16
> nsdRAIDBlockDeviceScheduler deadline
> nsdRAIDMaxTransientStale2FT 1
> nsdRAIDMaxTransientStale3FT 1
> syncWorkerThreads 256
> tscWorkerPool 64
> nsdInlineWriteMax 32k
> maxFilesToCache 12k
> maxStatCache 512
> maxGeneralThreads 1280
> flushedDataTarget 1024
> flushedInodeTarget 1024
> maxFileCleaners 1024
> maxBufferCleaners 1024
> logBufferCount 20
> logWrapAmountPct 2
> logWrapThreads 128
> maxAllocRegionsPerNode 32
> maxBackgroundDeletionThreads 16
> maxInodeDeallocPrefetch 128
> maxMBpS 16000
> maxReceiverThreads 128
> worker1Threads 1024
> worker3Threads 32
> [common]
> cipherList AUTHONLY
> socketMaxListenConnections 1500
> failureDetectionTime 60
> [common]
> adminMode central
> 
> File systems in cluster GSS.ebi.ac.uk:
> --------------------------------------
> /dev/gpfs1

> For more configuration paramenters i also attached a file with the 
> complete output of mmdiag --config.
> 
> 
> and mmlsfs:
> 
> File system attributes for /dev/gpfs1:
> ======================================
> flag                value                    description
> ------------------- ------------------------ 
> -----------------------------------
>  -f                 32768                    Minimum fragment size 
> in bytes (system pool)
>                     262144                   Minimum fragment size 
> in bytes (other pools)
>  -i                 512                      Inode size in bytes
>  -I                 32768                    Indirect block size in 
bytes
>  -m                 2                        Default number of 
> metadata replicas
>  -M                 2                        Maximum number of 
> metadata replicas
>  -r                 1                        Default number of data 
replicas
>  -R                 2                        Maximum number of data 
replicas
>  -j                 scatter                  Block allocation type
>  -D                 nfs4                     File locking semantics in 
effect
>  -k                 all                      ACL semantics in effect
>  -n                 1000                     Estimated number of 
> nodes that will mount file system
>  -B                 1048576                  Block size (system pool)
>                     8388608                  Block size (other pools)
>  -Q                 user;group;fileset       Quotas enforced
>                     user;group;fileset       Default quotas enabled
>  --filesetdf        no                       Fileset df enabled?
>  -V                 13.23 (3.5.0.7)          File system version
>  --create-time      Tue Mar 18 16:01:24 2014 File system creation time
>  -u                 yes                      Support for large LUNs?
>  -z                 no                       Is DMAPI enabled?
>  -L                 4194304                  Logfile size
>  -E                 yes                      Exact mtime mount option
>  -S                 yes                      Suppress atime mount option
>  -K                 whenpossible             Strict replica allocation 
option
>  --fastea           yes                      Fast external attributes 
enabled?
>  --inode-limit      134217728                Maximum number of inodes
>  -P                 system;data              Disk storage pools in file 
system
>  -d                 
> 
gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
>  -d                 
> 
gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
>  -d                 
> 
gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
>  -d                 
> 
gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
>  -d                 
> 
gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 
> Disks in file system
>  --perfileset-quota no                       Per-fileset quota 
enforcement
>  -A                 yes                      Automatic mount option
>  -o                 none                     Additional mount options
>  -T                 /gpfs1                   Default mount point
>  --mount-priority   0                        Mount priority
> 
> 
> Regards,
> Salvatore
> 

> On 14/10/14 17:22, Sven Oehme wrote:
> your GSS code version is very backlevel. 
> 
> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk 

> as well as mmlsconfig and mmlsfs all 
> 
> thx. Sven 
> 
> ------------------------------------------
> Sven Oehme 
> Scalable Storage Research 
> email: oehmes at us.ibm.com 
> Phone: +1 (408) 824-8904 
> IBM Almaden Research Lab 
> ------------------------------------------ 
> 
> 
> 
> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk> 
> To:        gpfsug-discuss at gpfsug.org 
> Date:        10/14/2014 08:23 AM 
> Subject:        Re: [gpfsug-discuss] wait for permission to append to 
log 
> Sent by:        gpfsug-discuss-bounces at gpfsug.org 
> 
> 
> 
> 
> On 14/10/14 15:51, Sven Oehme wrote: 
> it means there is contention on inserting data into the fast write 
> log on the GSS Node, which could be config or workload related 
> what GSS code version are you running 
> [root at ebi5-251 ~]# mmdiag --version
> 
> === mmdiag: version ===
> Current GPFS build: "3.5.0-11 efix1 (888041)".
> Built on Jul  9 2013 at 18:03:32
> Running 6 days 2 hours 10 minutes 35 secs 
> 
> 
> 
> and how are the nodes connected with each other (Ethernet or IB) ? 
> ethernet. they use the same bonding (4x10Gb/s) where the data is 
> passing. We don't have admin dedicated network 
> 
> [root at gss03a ~]# mmlscluster 
> 
> GPFS cluster information
> ========================
>   GPFS cluster name:         GSS.ebi.ac.uk
>   GPFS cluster id:           17987981184946329605
>   GPFS UID domain:           GSS.ebi.ac.uk
>   Remote shell command:      /usr/bin/ssh
>   Remote file copy command:  /usr/bin/scp
> 
> GPFS cluster configuration servers:
> -----------------------------------
>   Primary server:    gss01a.ebi.ac.uk
>   Secondary server:  gss02b.ebi.ac.uk
> 
>  Node  Daemon node name    IP address  Admin node name     Designation
> -----------------------------------------------------------------------
>    1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk    
quorum-manager
>    2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk    
quorum-manager
>    3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk    
quorum-manager
>    4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk    
quorum-manager
>    5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk    
quorum-manager
>    6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk    
quorum-manager
> 
> 
> Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different 
> subnet because of datacenter constraints ( They are not physically 
> in the same row, and due to network constraints was not possible to 
> put them in the same subnet). The packets are routed, but should not
> be a problem as there is 160Gb/s bandwidth between them.
> 
> Regards,
> Salvatore
> 
> 
> 
> ------------------------------------------
> Sven Oehme 
> Scalable Storage Research 
> email: oehmes at us.ibm.com 
> Phone: +1 (408) 824-8904 
> IBM Almaden Research Lab 
> ------------------------------------------ 
> 
> 
> 
> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk> 
> To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
> Date:        10/14/2014 07:40 AM 
> Subject:        [gpfsug-discuss] wait for permission to append to log 
> Sent by:        gpfsug-discuss-bounces at gpfsug.org 
> 
> 
> 
> hello all,
> could someone explain me the meaning of those waiters?
> 
> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds, 
> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) 
> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
> 
> Does it means that the vdisk logs are struggling?
> 
> Regards,
> Salvatore
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 

> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/
> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/7c32e015/attachment-0003.htm>

From zgiles at gmail.com  Tue Oct 14 18:32:50 2014
From: zgiles at gmail.com (Zachary Giles)
Date: Tue, 14 Oct 2014 13:32:50 -0400
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>
References: <543D35A7.7080800@ebi.ac.uk>
	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>
	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
	<543D51B6.3070602@ebi.ac.uk>
	<OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>
Message-ID: <CAMYZk=en0bOeLErefCQ0X_TybYh+9QUxiKszhnwFQvqz_BHOFQ@mail.gmail.com>

Except that AFAIK no one has published how to update GSS or where the
update code is.. All I've heard is "contact your sales rep".
Any pointers?

On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme <oehmes at us.ibm.com> wrote:
> you basically run GSS 1.0 code , while in the current version is GSS 2.0
> (which replaced Version 1.5 2 month ago)
>
> GSS 1.5 and 2.0 have several enhancements in this space so i strongly
> encourage you to upgrade your systems.
>
> if you can specify a bit what your workload is there might also be
> additional knobs we can turn to change the behavior.
>
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
> gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM:
>
>> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date: 10/14/2014 09:40 AM
>> Subject: Re: [gpfsug-discuss] wait for permission to append to log
>> Sent by: gpfsug-discuss-bounces at gpfsug.org
>>
>> Thanks in advance for your help.
>>
>> We have 6 RG:
>
>>  recovery group        vdisks     vdisks  servers
>>  ------------------  -----------  ------  -------
>>  gss01a                        4       8
>> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk
>>  gss01b                        4       8
>> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk
>>  gss02a                        4       8
>> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk
>>  gss02b                        4       8
>> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk
>>  gss03a                        4       8
>> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk
>>  gss03b                        4       8
>> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk
>>
>> Check the attached file for RG details.
>> Following mmlsconfig:
>
>> [root at gss01a ~]# mmlsconfig
>> Configuration data for cluster GSS.ebi.ac.uk:
>> ---------------------------------------------
>> myNodeConfigNumber 1
>> clusterName GSS.ebi.ac.uk
>> clusterId 17987981184946329605
>> autoload no
>> dmapiFileHandleSize 32
>> minReleaseLevel 3.5.0.11
>> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
>> pagepool 38g
>> nsdRAIDBufferPoolSizePct 80
>> maxBufferDescs 2m
>> numaMemoryInterleave yes
>> prefetchPct 5
>> maxblocksize 16m
>> nsdRAIDTracks 128k
>> ioHistorySize 64k
>> nsdRAIDSmallBufferSize 256k
>> nsdMaxWorkerThreads 3k
>> nsdMinWorkerThreads 3k
>> nsdRAIDSmallThreadRatio 2
>> nsdRAIDThreadsPerQueue 16
>> nsdClientCksumTypeLocal ck64
>> nsdClientCksumTypeRemote ck64
>> nsdRAIDEventLogToConsole all
>> nsdRAIDFastWriteFSDataLimit 64k
>> nsdRAIDFastWriteFSMetadataLimit 256k
>> nsdRAIDReconstructAggressiveness 1
>> nsdRAIDFlusherBuffersLowWatermarkPct 20
>> nsdRAIDFlusherBuffersLimitPct 80
>> nsdRAIDFlusherTracksLowWatermarkPct 20
>> nsdRAIDFlusherTracksLimitPct 80
>> nsdRAIDFlusherFWLogHighWatermarkMB 1000
>> nsdRAIDFlusherFWLogLimitMB 5000
>> nsdRAIDFlusherThreadsLowWatermark 1
>> nsdRAIDFlusherThreadsHighWatermark 512
>> nsdRAIDBlockDeviceMaxSectorsKB 4096
>> nsdRAIDBlockDeviceNrRequests 32
>> nsdRAIDBlockDeviceQueueDepth 16
>> nsdRAIDBlockDeviceScheduler deadline
>> nsdRAIDMaxTransientStale2FT 1
>> nsdRAIDMaxTransientStale3FT 1
>> syncWorkerThreads 256
>> tscWorkerPool 64
>> nsdInlineWriteMax 32k
>> maxFilesToCache 12k
>> maxStatCache 512
>> maxGeneralThreads 1280
>> flushedDataTarget 1024
>> flushedInodeTarget 1024
>> maxFileCleaners 1024
>> maxBufferCleaners 1024
>> logBufferCount 20
>> logWrapAmountPct 2
>> logWrapThreads 128
>> maxAllocRegionsPerNode 32
>> maxBackgroundDeletionThreads 16
>> maxInodeDeallocPrefetch 128
>> maxMBpS 16000
>> maxReceiverThreads 128
>> worker1Threads 1024
>> worker3Threads 32
>> [common]
>> cipherList AUTHONLY
>> socketMaxListenConnections 1500
>> failureDetectionTime 60
>> [common]
>> adminMode central
>>
>> File systems in cluster GSS.ebi.ac.uk:
>> --------------------------------------
>> /dev/gpfs1
>
>> For more configuration paramenters i also attached a file with the
>> complete output of mmdiag --config.
>>
>>
>> and mmlsfs:
>>
>> File system attributes for /dev/gpfs1:
>> ======================================
>> flag                value                    description
>> ------------------- ------------------------
>> -----------------------------------
>>  -f                 32768                    Minimum fragment size
>> in bytes (system pool)
>>                     262144                   Minimum fragment size
>> in bytes (other pools)
>>  -i                 512                      Inode size in bytes
>>  -I                 32768                    Indirect block size in bytes
>>  -m                 2                        Default number of
>> metadata replicas
>>  -M                 2                        Maximum number of
>> metadata replicas
>>  -r                 1                        Default number of data
>> replicas
>>  -R                 2                        Maximum number of data
>> replicas
>>  -j                 scatter                  Block allocation type
>>  -D                 nfs4                     File locking semantics in
>> effect
>>  -k                 all                      ACL semantics in effect
>>  -n                 1000                     Estimated number of
>> nodes that will mount file system
>>  -B                 1048576                  Block size (system pool)
>>                     8388608                  Block size (other pools)
>>  -Q                 user;group;fileset       Quotas enforced
>>                     user;group;fileset       Default quotas enabled
>>  --filesetdf        no                       Fileset df enabled?
>>  -V                 13.23 (3.5.0.7)          File system version
>>  --create-time      Tue Mar 18 16:01:24 2014 File system creation time
>>  -u                 yes                      Support for large LUNs?
>>  -z                 no                       Is DMAPI enabled?
>>  -L                 4194304                  Logfile size
>>  -E                 yes                      Exact mtime mount option
>>  -S                 yes                      Suppress atime mount option
>>  -K                 whenpossible             Strict replica allocation
>> option
>>  --fastea           yes                      Fast external attributes
>> enabled?
>>  --inode-limit      134217728                Maximum number of inodes
>>  -P                 system;data              Disk storage pools in file
>> system
>>  -d
>>
>> gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
>>  -d
>>
>> gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
>>  -d
>>
>> gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
>>  -d
>>
>> gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
>>  -d
>>
>> gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3
>> Disks in file system
>>  --perfileset-quota no                       Per-fileset quota enforcement
>>  -A                 yes                      Automatic mount option
>>  -o                 none                     Additional mount options
>>  -T                 /gpfs1                   Default mount point
>>  --mount-priority   0                        Mount priority
>>
>>
>> Regards,
>> Salvatore
>>
>
>> On 14/10/14 17:22, Sven Oehme wrote:
>> your GSS code version is very backlevel.
>>
>> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk
>> as well as mmlsconfig and mmlsfs all
>>
>> thx. Sven
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug-discuss at gpfsug.org
>> Date:        10/14/2014 08:23 AM
>> Subject:        Re: [gpfsug-discuss] wait for permission to append to log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>>
>> On 14/10/14 15:51, Sven Oehme wrote:
>> it means there is contention on inserting data into the fast write
>> log on the GSS Node, which could be config or workload related
>> what GSS code version are you running
>> [root at ebi5-251 ~]# mmdiag --version
>>
>> === mmdiag: version ===
>> Current GPFS build: "3.5.0-11 efix1 (888041)".
>> Built on Jul  9 2013 at 18:03:32
>> Running 6 days 2 hours 10 minutes 35 secs
>>
>>
>>
>> and how are the nodes connected with each other (Ethernet or IB) ?
>> ethernet. they use the same bonding (4x10Gb/s) where the data is
>> passing. We don't have admin dedicated network
>>
>> [root at gss03a ~]# mmlscluster
>>
>> GPFS cluster information
>> ========================
>>   GPFS cluster name:         GSS.ebi.ac.uk
>>   GPFS cluster id:           17987981184946329605
>>   GPFS UID domain:           GSS.ebi.ac.uk
>>   Remote shell command:      /usr/bin/ssh
>>   Remote file copy command:  /usr/bin/scp
>>
>> GPFS cluster configuration servers:
>> -----------------------------------
>>   Primary server:    gss01a.ebi.ac.uk
>>   Secondary server:  gss02b.ebi.ac.uk
>>
>>  Node  Daemon node name    IP address  Admin node name     Designation
>> -----------------------------------------------------------------------
>>    1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk    quorum-manager
>>    2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk    quorum-manager
>>    3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk    quorum-manager
>>    4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk    quorum-manager
>>    5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk    quorum-manager
>>    6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk    quorum-manager
>>
>>
>> Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different
>> subnet because of datacenter constraints ( They are not physically
>> in the same row, and due to network constraints was not possible to
>> put them in the same subnet). The packets are routed, but should not
>> be a problem as there is 160Gb/s bandwidth between them.
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date:        10/14/2014 07:40 AM
>> Subject:        [gpfsug-discuss] wait for permission to append to log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>> hello all,
>> could someone explain me the meaning of those waiters?
>>
>> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>>
>> Does it means that the vdisk logs are struggling?
>>
>> Regards,
>> Salvatore
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/
>> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM]
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
Zach Giles
zgiles at gmail.com


From oehmes at us.ibm.com  Tue Oct 14 18:38:10 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Tue, 14 Oct 2014 10:38:10 -0700
Subject: [gpfsug-discuss] wait for permission to append to log
In-Reply-To: <CAMYZk=en0bOeLErefCQ0X_TybYh+9QUxiKszhnwFQvqz_BHOFQ@mail.gmail.com>
References: <543D35A7.7080800@ebi.ac.uk>	<OF8533131E.87E23F5C-ON88257D71.00514E0D-88257D71.00519716@us.ibm.com>
	<543D3FD5.1060705@ebi.ac.uk>	<OF71FFCF3B.3CC98883-ON88257D71.005994DC-88257D71.0059F7B7@us.ibm.com>
	<543D51B6.3070602@ebi.ac.uk>	<OF2470B136.CD77432D-ON88257D71.005EA69C-88257D71.005F911D@us.ibm.com>
	<CAMYZk=en0bOeLErefCQ0X_TybYh+9QUxiKszhnwFQvqz_BHOFQ@mail.gmail.com>
Message-ID: <OF139B4A1E.3C67F6BA-ON88257D71.00609314-88257D71.0060E13E@us.ibm.com>

i personally don't know, i am in GPFS Research, not in support :-)
but have you tried to contact your sales rep ? 
if you are not successful with that, shoot me a direct email with details 
about your company name, country and customer number and i try to get you 
somebody to help.

thx. Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Zachary Giles <zgiles at gmail.com>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/14/2014 10:33 AM
Subject:        Re: [gpfsug-discuss] wait for permission to append to log
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Except that AFAIK no one has published how to update GSS or where the
update code is.. All I've heard is "contact your sales rep".
Any pointers?

On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme <oehmes at us.ibm.com> wrote:
> you basically run GSS 1.0 code , while in the current version is GSS 2.0
> (which replaced Version 1.5 2 month ago)
>
> GSS 1.5 and 2.0 have several enhancements in this space so i strongly
> encourage you to upgrade your systems.
>
> if you can specify a bit what your workload is there might also be
> additional knobs we can turn to change the behavior.
>
>
> ------------------------------------------
> Sven Oehme
> Scalable Storage Research
> email: oehmes at us.ibm.com
> Phone: +1 (408) 824-8904
> IBM Almaden Research Lab
> ------------------------------------------
>
> gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM:
>
>> From: Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date: 10/14/2014 09:40 AM
>> Subject: Re: [gpfsug-discuss] wait for permission to append to log
>> Sent by: gpfsug-discuss-bounces at gpfsug.org
>>
>> Thanks in advance for your help.
>>
>> We have 6 RG:
>
>>  recovery group        vdisks     vdisks  servers
>>  ------------------  -----------  ------  -------
>>  gss01a                        4       8
>> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk
>>  gss01b                        4       8
>> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk
>>  gss02a                        4       8
>> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk
>>  gss02b                        4       8
>> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk
>>  gss03a                        4       8
>> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk
>>  gss03b                        4       8
>> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk
>>
>> Check the attached file for RG details.
>> Following mmlsconfig:
>
>> [root at gss01a ~]# mmlsconfig
>> Configuration data for cluster GSS.ebi.ac.uk:
>> ---------------------------------------------
>> myNodeConfigNumber 1
>> clusterName GSS.ebi.ac.uk
>> clusterId 17987981184946329605
>> autoload no
>> dmapiFileHandleSize 32
>> minReleaseLevel 3.5.0.11
>> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b]
>> pagepool 38g
>> nsdRAIDBufferPoolSizePct 80
>> maxBufferDescs 2m
>> numaMemoryInterleave yes
>> prefetchPct 5
>> maxblocksize 16m
>> nsdRAIDTracks 128k
>> ioHistorySize 64k
>> nsdRAIDSmallBufferSize 256k
>> nsdMaxWorkerThreads 3k
>> nsdMinWorkerThreads 3k
>> nsdRAIDSmallThreadRatio 2
>> nsdRAIDThreadsPerQueue 16
>> nsdClientCksumTypeLocal ck64
>> nsdClientCksumTypeRemote ck64
>> nsdRAIDEventLogToConsole all
>> nsdRAIDFastWriteFSDataLimit 64k
>> nsdRAIDFastWriteFSMetadataLimit 256k
>> nsdRAIDReconstructAggressiveness 1
>> nsdRAIDFlusherBuffersLowWatermarkPct 20
>> nsdRAIDFlusherBuffersLimitPct 80
>> nsdRAIDFlusherTracksLowWatermarkPct 20
>> nsdRAIDFlusherTracksLimitPct 80
>> nsdRAIDFlusherFWLogHighWatermarkMB 1000
>> nsdRAIDFlusherFWLogLimitMB 5000
>> nsdRAIDFlusherThreadsLowWatermark 1
>> nsdRAIDFlusherThreadsHighWatermark 512
>> nsdRAIDBlockDeviceMaxSectorsKB 4096
>> nsdRAIDBlockDeviceNrRequests 32
>> nsdRAIDBlockDeviceQueueDepth 16
>> nsdRAIDBlockDeviceScheduler deadline
>> nsdRAIDMaxTransientStale2FT 1
>> nsdRAIDMaxTransientStale3FT 1
>> syncWorkerThreads 256
>> tscWorkerPool 64
>> nsdInlineWriteMax 32k
>> maxFilesToCache 12k
>> maxStatCache 512
>> maxGeneralThreads 1280
>> flushedDataTarget 1024
>> flushedInodeTarget 1024
>> maxFileCleaners 1024
>> maxBufferCleaners 1024
>> logBufferCount 20
>> logWrapAmountPct 2
>> logWrapThreads 128
>> maxAllocRegionsPerNode 32
>> maxBackgroundDeletionThreads 16
>> maxInodeDeallocPrefetch 128
>> maxMBpS 16000
>> maxReceiverThreads 128
>> worker1Threads 1024
>> worker3Threads 32
>> [common]
>> cipherList AUTHONLY
>> socketMaxListenConnections 1500
>> failureDetectionTime 60
>> [common]
>> adminMode central
>>
>> File systems in cluster GSS.ebi.ac.uk:
>> --------------------------------------
>> /dev/gpfs1
>
>> For more configuration paramenters i also attached a file with the
>> complete output of mmdiag --config.
>>
>>
>> and mmlsfs:
>>
>> File system attributes for /dev/gpfs1:
>> ======================================
>> flag                value                    description
>> ------------------- ------------------------
>> -----------------------------------
>>  -f                 32768                    Minimum fragment size
>> in bytes (system pool)
>>                     262144                   Minimum fragment size
>> in bytes (other pools)
>>  -i                 512                      Inode size in bytes
>>  -I                 32768                    Indirect block size in 
bytes
>>  -m                 2                        Default number of
>> metadata replicas
>>  -M                 2                        Maximum number of
>> metadata replicas
>>  -r                 1                        Default number of data
>> replicas
>>  -R                 2                        Maximum number of data
>> replicas
>>  -j                 scatter                  Block allocation type
>>  -D                 nfs4                     File locking semantics in
>> effect
>>  -k                 all                      ACL semantics in effect
>>  -n                 1000                     Estimated number of
>> nodes that will mount file system
>>  -B                 1048576                  Block size (system pool)
>>                     8388608                  Block size (other pools)
>>  -Q                 user;group;fileset       Quotas enforced
>>                     user;group;fileset       Default quotas enabled
>>  --filesetdf        no                       Fileset df enabled?
>>  -V                 13.23 (3.5.0.7)          File system version
>>  --create-time      Tue Mar 18 16:01:24 2014 File system creation time
>>  -u                 yes                      Support for large LUNs?
>>  -z                 no                       Is DMAPI enabled?
>>  -L                 4194304                  Logfile size
>>  -E                 yes                      Exact mtime mount option
>>  -S                 yes                      Suppress atime mount 
option
>>  -K                 whenpossible             Strict replica allocation
>> option
>>  --fastea           yes                      Fast external attributes
>> enabled?
>>  --inode-limit      134217728                Maximum number of inodes
>>  -P                 system;data              Disk storage pools in file
>> system
>>  -d
>>
>> 
gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1;
>>  -d
>>
>> 
gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2;
>>  -d
>>
>> 
gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1;
>>  -d
>>
>> 
gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1;
>>  -d
>>
>> 
gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3
>> Disks in file system
>>  --perfileset-quota no                       Per-fileset quota 
enforcement
>>  -A                 yes                      Automatic mount option
>>  -o                 none                     Additional mount options
>>  -T                 /gpfs1                   Default mount point
>>  --mount-priority   0                        Mount priority
>>
>>
>> Regards,
>> Salvatore
>>
>
>> On 14/10/14 17:22, Sven Oehme wrote:
>> your GSS code version is very backlevel.
>>
>> can you please send me the output of mmlsrecoverygroup RGNAME -L 
--pdisk
>> as well as mmlsconfig and mmlsfs all
>>
>> thx. Sven
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug-discuss at gpfsug.org
>> Date:        10/14/2014 08:23 AM
>> Subject:        Re: [gpfsug-discuss] wait for permission to append to 
log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>>
>> On 14/10/14 15:51, Sven Oehme wrote:
>> it means there is contention on inserting data into the fast write
>> log on the GSS Node, which could be config or workload related
>> what GSS code version are you running
>> [root at ebi5-251 ~]# mmdiag --version
>>
>> === mmdiag: version ===
>> Current GPFS build: "3.5.0-11 efix1 (888041)".
>> Built on Jul  9 2013 at 18:03:32
>> Running 6 days 2 hours 10 minutes 35 secs
>>
>>
>>
>> and how are the nodes connected with each other (Ethernet or IB) ?
>> ethernet. they use the same bonding (4x10Gb/s) where the data is
>> passing. We don't have admin dedicated network
>>
>> [root at gss03a ~]# mmlscluster
>>
>> GPFS cluster information
>> ========================
>>   GPFS cluster name:         GSS.ebi.ac.uk
>>   GPFS cluster id:           17987981184946329605
>>   GPFS UID domain:           GSS.ebi.ac.uk
>>   Remote shell command:      /usr/bin/ssh
>>   Remote file copy command:  /usr/bin/scp
>>
>> GPFS cluster configuration servers:
>> -----------------------------------
>>   Primary server:    gss01a.ebi.ac.uk
>>   Secondary server:  gss02b.ebi.ac.uk
>>
>>  Node  Daemon node name    IP address  Admin node name     Designation
>> -----------------------------------------------------------------------
>>    1   gss01a.ebi.ac.uk    10.7.28.2   gss01a.ebi.ac.uk quorum-manager
>>    2   gss01b.ebi.ac.uk    10.7.28.3   gss01b.ebi.ac.uk quorum-manager
>>    3   gss02a.ebi.ac.uk    10.7.28.67  gss02a.ebi.ac.uk quorum-manager
>>    4   gss02b.ebi.ac.uk    10.7.28.66  gss02b.ebi.ac.uk quorum-manager
>>    5   gss03a.ebi.ac.uk    10.7.28.34  gss03a.ebi.ac.uk quorum-manager
>>    6   gss03b.ebi.ac.uk    10.7.28.35  gss03b.ebi.ac.uk quorum-manager
>>
>>
>> Note: The 3 node "pairs" (gss01, gss02 and gss03)  are in different
>> subnet because of datacenter constraints ( They are not physically
>> in the same row, and due to network constraints was not possible to
>> put them in the same subnet). The packets are routed, but should not
>> be a problem as there is 160Gb/s bandwidth between them.
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> ------------------------------------------
>> Sven Oehme
>> Scalable Storage Research
>> email: oehmes at us.ibm.com
>> Phone: +1 (408) 824-8904
>> IBM Almaden Research Lab
>> ------------------------------------------
>>
>>
>>
>> From:        Salvatore Di Nardo <sdinardo at ebi.ac.uk>
>> To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
>> Date:        10/14/2014 07:40 AM
>> Subject:        [gpfsug-discuss] wait for permission to append to log
>> Sent by:        gpfsug-discuss-bounces at gpfsug.org
>>
>>
>>
>> hello all,
>> could someone explain me the meaning of those waiters?
>>
>> gss02b.ebi.ac.uk:  0x7F21EA8541B0 waiting 0.122786709 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA5F4EC0 waiting 0.122770807 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA9BD1A0 waiting 0.122115115 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA32FF30 waiting 0.121371877 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6A1BA0 waiting 0.119322600 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2E4330 waiting 0.118216774 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA72E930 waiting 0.117961594 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6539C0 waiting 0.116133122 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3D3490 waiting 0.116103642 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA85A060 waiting 0.115137978 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4C84A0 waiting 0.115046631 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA229310 waiting 0.114498225 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2AB630 waiting 0.113035120 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA83D9E0 waiting 0.112934666 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA736DC0 waiting 0.112834203 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3A2C20 waiting 0.111498004 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3B2250 waiting 0.111309423 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAABDF10 waiting 0.110939219 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA27A00 waiting 0.110025022 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA8D6A0 waiting 0.109176110 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2B3AC0 waiting 0.109025355 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2080D0 waiting 0.108702893 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA3AC3A0 waiting 0.107691494 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB460E0 waiting 0.106003854 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2093C0 waiting 0.105781682 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA6FBAE0 waiting 0.105696084 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA573E90 waiting 0.105182795 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA4191E0 waiting 0.104335963 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA82AAE0 waiting 0.104079258 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA538BB0 waiting 0.103798658 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA15DF0 waiting 0.102778144 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA57C320 waiting 0.100503136 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA802700 waiting 0.100499392 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAA5F410 waiting 0.100489143 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA861200 waiting 0.100351636 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA54BAB0 waiting 0.099615942 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAAAFBD0 waiting 0.099477387 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA657290 waiting 0.099123599 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2BD240 waiting 0.099074074 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA205AF0 waiting 0.097532291 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA477CE0 waiting 0.097311417 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA2F9810 waiting 0.096209425 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA463AF0 waiting 0.096143868 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8B2CB0 waiting 0.094143517 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA7D1E90 waiting 0.093156759 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB473D0 waiting 0.093154775 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EAB03C60 waiting 0.092952495 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>> gss02b.ebi.ac.uk:  0x7F21EA8766E0 waiting 0.092908405 seconds,
>> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750)
>> (VdiskLogAppendCondvar), reason 'wait for permission to append to log'
>>
>> Does it means that the vdisk logs are struggling?
>>
>> Regards,
>> Salvatore
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/
>> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM]
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>


-- 
Zach Giles
zgiles at gmail.com
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141014/369fc6e3/attachment-0003.htm>

From tmcneil at kingston.ac.uk  Wed Oct 15 14:01:49 2014
From: tmcneil at kingston.ac.uk (Mcneil, Tony)
Date: Wed, 15 Oct 2014 14:01:49 +0100
Subject: [gpfsug-discuss] Hello
Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705@KUMBX.kuds.kingston.ac.uk>

Hello All,

Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'

We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.

The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.

So far we have migrated all our students and approximately 60% of our staff.

Looking forward to receiving some interesting posts from the forum.

Regards
Tony.

Tony McNeil
Senior Systems Support Analyst,  Infrastructure,   Information Services
______________________________________________________________________________

T   Internal: 62852
T   020 8417 2852

Kingston University London
Penrhyn Road, Kingston upon Thames KT1 2EE
www.kingston.ac.uk<http://www.kingston.ac.uk>

Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed
to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
Please consider the environment before printing this email.


This email has been scanned for all viruses by the MessageLabs Email
Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141015/8f528bcf/attachment-0003.htm>

From Bill.Pappas at STJUDE.ORG  Thu Oct 16 14:49:57 2014
From: Bill.Pappas at STJUDE.ORG (Pappas, Bill)
Date: Thu, 16 Oct 2014 08:49:57 -0500
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>

Are you using ctdb?

Thanks,
Bill Pappas -
Manager - Enterprise Storage Group
Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
262 Danny Thomas Place, Mail Stop 504
Memphis, TN 38105
bill.pappas at stjude.org
(901) 595-4549 office
www.stjude.org
Email disclaimer: http://www.stjude.org/emaildisclaimer

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
Sent: Thursday, October 16, 2014 6:00 AM
To: gpfsug-discuss at gpfsug.org
Subject: gpfsug-discuss Digest, Vol 33, Issue 19

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at gpfsug.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at gpfsug.org

You can reach the person managing the list at
	gpfsug-discuss-owner at gpfsug.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Hello (Mcneil, Tony)


----------------------------------------------------------------------

Message: 1
Date: Wed, 15 Oct 2014 14:01:49 +0100
From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Hello
Message-ID:
	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
	
Content-Type: text/plain; charset="us-ascii"

Hello All,

Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'

We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.

The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.

So far we have migrated all our students and approximately 60% of our staff.

Looking forward to receiving some interesting posts from the forum.

Regards
Tony.

Tony McNeil
Senior Systems Support Analyst,  Infrastructure,   Information Services
______________________________________________________________________________

T   Internal: 62852
T   020 8417 2852

Kingston University London
Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>

Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
Please consider the environment before printing this email.


This email has been scanned for all viruses by the MessageLabs Email Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 33, Issue 19
**********************************************


From tmcneil at kingston.ac.uk  Fri Oct 17 06:25:00 2014
From: tmcneil at kingston.ac.uk (Mcneil, Tony)
Date: Fri, 17 Oct 2014 06:25:00 +0100
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
In-Reply-To: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>

Hi Bill,

Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel

Regards
Tony.

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
Sent: 16 October 2014 14:50
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)

Are you using ctdb?

Thanks,
Bill Pappas -
Manager - Enterprise Storage Group
Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
262 Danny Thomas Place, Mail Stop 504
Memphis, TN 38105
bill.pappas at stjude.org
(901) 595-4549 office
www.stjude.org
Email disclaimer: http://www.stjude.org/emaildisclaimer

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
Sent: Thursday, October 16, 2014 6:00 AM
To: gpfsug-discuss at gpfsug.org
Subject: gpfsug-discuss Digest, Vol 33, Issue 19

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at gpfsug.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at gpfsug.org

You can reach the person managing the list at
	gpfsug-discuss-owner at gpfsug.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Hello (Mcneil, Tony)


----------------------------------------------------------------------

Message: 1
Date: Wed, 15 Oct 2014 14:01:49 +0100
From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Hello
Message-ID:
	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
	
Content-Type: text/plain; charset="us-ascii"

Hello All,

Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'

We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.

The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.

So far we have migrated all our students and approximately 60% of our staff.

Looking forward to receiving some interesting posts from the forum.

Regards
Tony.

Tony McNeil
Senior Systems Support Analyst,  Infrastructure,   Information Services
______________________________________________________________________________

T   Internal: 62852
T   020 8417 2852

Kingston University London
Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>

Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
Please consider the environment before printing this email.


This email has been scanned for all viruses by the MessageLabs Email Security System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 33, Issue 19
**********************************************


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

This email has been scanned for all viruses by the MessageLabs Email
Security System.

This email has been scanned for all viruses by the MessageLabs Email
Security System.


From chair at gpfsug.org  Tue Oct 21 11:42:10 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Tue, 21 Oct 2014 11:42:10 +0100
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
In-Reply-To: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>
References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
	<41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>
Message-ID: <54463882.7070009@gpfsug.org>

I noticed that v7000 Unified is using CTDB v3.3.
What magic version is that as it's not in the git tree.  Latest tagged 
is 2.5.4.
Is that a question for Amitay?

On 17/10/14 06:25, Mcneil, Tony wrote:
> Hi Bill,
>
> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>
> Regards
> Tony.
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
> Sent: 16 October 2014 14:50
> To: gpfsug-discuss at gpfsug.org
> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>
> Are you using ctdb?
>
> Thanks,
> Bill Pappas -
> Manager - Enterprise Storage Group
> Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
> 262 Danny Thomas Place, Mail Stop 504
> Memphis, TN 38105
> bill.pappas at stjude.org
> (901) 595-4549 office
> www.stjude.org
> Email disclaimer: http://www.stjude.org/emaildisclaimer
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
> Sent: Thursday, October 16, 2014 6:00 AM
> To: gpfsug-discuss at gpfsug.org
> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>
> Send gpfsug-discuss mailing list submissions to
> 	gpfsug-discuss at gpfsug.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> or, via email, send a message with subject or body 'help' to
> 	gpfsug-discuss-request at gpfsug.org
>
> You can reach the person managing the list at
> 	gpfsug-discuss-owner at gpfsug.org
>
> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>     1. Hello (Mcneil, Tony)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 15 Oct 2014 14:01:49 +0100
> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
> Subject: [gpfsug-discuss] Hello
> Message-ID:
> 	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
> 	
> Content-Type: text/plain; charset="us-ascii"
>
> Hello All,
>
> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>
> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>
> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>
> So far we have migrated all our students and approximately 60% of our staff.
>
> Looking forward to receiving some interesting posts from the forum.
>
> Regards
> Tony.
>
> Tony McNeil
> Senior Systems Support Analyst,  Infrastructure,   Information Services
> ______________________________________________________________________________
>
> T   Internal: 62852
> T   020 8417 2852
>
> Kingston University London
> Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>
>
> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
> Please consider the environment before printing this email.
>
>
> This email has been scanned for all viruses by the MessageLabs Email Security System.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> End of gpfsug-discuss Digest, Vol 33, Issue 19
> **********************************************
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> This email has been scanned for all viruses by the MessageLabs Email
> Security System.
>
> This email has been scanned for all viruses by the MessageLabs Email
> Security System.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From rtriendl at ddn.com  Tue Oct 21 11:53:37 2014
From: rtriendl at ddn.com (Robert Triendl)
Date: Tue, 21 Oct 2014 10:53:37 +0000
Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
In-Reply-To: <54463882.7070009@gpfsug.org>
References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org>
	<41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk>
	<54463882.7070009@gpfsug.org>
Message-ID: <F7513C48-28BE-48FB-B44F-3CCF4F305DFE@ddn.com>

Yes, I think so? I am;-)

On 2014/10/21, at 19:42, Jez Tucker (Chair) <chair at gpfsug.org> wrote:

> I noticed that v7000 Unified is using CTDB v3.3.
> What magic version is that as it's not in the git tree.  Latest tagged is 2.5.4.
> Is that a question for Amitay?
> 
> On 17/10/14 06:25, Mcneil, Tony wrote:
>> Hi Bill,
>> 
>> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>> 
>> Regards
>> Tony.
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
>> Sent: 16 October 2014 14:50
>> To: gpfsug-discuss at gpfsug.org
>> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>> 
>> Are you using ctdb?
>> 
>> Thanks,
>> Bill Pappas -
>> Manager - Enterprise Storage Group
>> Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
>> 262 Danny Thomas Place, Mail Stop 504
>> Memphis, TN 38105
>> bill.pappas at stjude.org
>> (901) 595-4549 office
>> www.stjude.org
>> Email disclaimer: http://www.stjude.org/emaildisclaimer
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
>> Sent: Thursday, October 16, 2014 6:00 AM
>> To: gpfsug-discuss at gpfsug.org
>> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>> 
>> Send gpfsug-discuss mailing list submissions to
>> 	gpfsug-discuss at gpfsug.org
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> or, via email, send a message with subject or body 'help' to
>> 	gpfsug-discuss-request at gpfsug.org
>> 
>> You can reach the person managing the list at
>> 	gpfsug-discuss-owner at gpfsug.org
>> 
>> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>> 
>> 
>> Today's Topics:
>> 
>>    1. Hello (Mcneil, Tony)
>> 
>> 
>> ----------------------------------------------------------------------
>> 
>> Message: 1
>> Date: Wed, 15 Oct 2014 14:01:49 +0100
>> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
>> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
>> Subject: [gpfsug-discuss] Hello
>> Message-ID:
>> 	<41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk>
>> 	
>> Content-Type: text/plain; charset="us-ascii"
>> 
>> Hello All,
>> 
>> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>> 
>> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>> 
>> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>> 
>> So far we have migrated all our students and approximately 60% of our staff.
>> 
>> Looking forward to receiving some interesting posts from the forum.
>> 
>> Regards
>> Tony.
>> 
>> Tony McNeil
>> Senior Systems Support Analyst,  Infrastructure,   Information Services
>> ______________________________________________________________________________
>> 
>> T   Internal: 62852
>> T   020 8417 2852
>> 
>> Kingston University London
>> Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk<http://www.kingston.ac.uk>
>> 
>> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
>> Please consider the environment before printing this email.
>> 
>> 
>> This email has been scanned for all viruses by the MessageLabs Email Security System.
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528bcf/attachment-0001.html>
>> 
>> ------------------------------
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> End of gpfsug-discuss Digest, Vol 33, Issue 19
>> **********************************************
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> This email has been scanned for all viruses by the MessageLabs Email
>> Security System.
>> 
>> This email has been scanned for all viruses by the MessageLabs Email
>> Security System.
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From Bill.Pappas at STJUDE.ORG  Tue Oct 21 16:59:08 2014
From: Bill.Pappas at STJUDE.ORG (Pappas, Bill)
Date: Tue, 21 Oct 2014 10:59:08 -0500
Subject: [gpfsug-discuss] Hello (Mcneil, Tony) (Jez Tucker (Chair))
Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE73664E3E8D@10.stjude.org>

>>Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.

1. What procedure did you follow to configure ctdb/samba to work?  Was it hard?  Could you show us, if permitted?
2. Are you also controlling NFS via ctdb?
3. Are you managing multiple IP devices?  Eg: ethX0 for VLAN104 and ethX1 for VLAN103 (<- for fast 10GbE users).

We use SoNAS and v7000 for most NAS and they use ctdb.  Their ctdb results are overall 'ok', with a few bumps here or there. Not too many ctdb PMRs over the 3-4 years on SoNAS.  
We want to set up ctdb for a GPFS AFM cache that services GPSF data clients.  That cache writes to an AFM home (SoNAS).  This cache also uses Samba and NFS for lightweight (as in IO, though still important) file access on this cache.
It does not use ctdb, but I know it should.
I would love to learn how you set your environment up even if it may be a little (or a lot) different.


Thanks,
Bill Pappas -
Manager - Enterprise Storage Group
Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital
262 Danny Thomas Place, Mail Stop 504
Memphis, TN 38105
bill.pappas at stjude.org
(901) 595-4549 office
www.stjude.org
Email disclaimer: http://www.stjude.org/emaildisclaimer

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org
Sent: Tuesday, October 21, 2014 6:00 AM
To: gpfsug-discuss at gpfsug.org
Subject: gpfsug-discuss Digest, Vol 33, Issue 21

Send gpfsug-discuss mailing list submissions to
	gpfsug-discuss at gpfsug.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
	gpfsug-discuss-request at gpfsug.org

You can reach the person managing the list at
	gpfsug-discuss-owner at gpfsug.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Hello (Mcneil, Tony) (Jez Tucker (Chair))
   2. Re: Hello (Mcneil, Tony) (Robert Triendl)


----------------------------------------------------------------------

Message: 1
Date: Tue, 21 Oct 2014 11:42:10 +0100
From: "Jez Tucker (Chair)" <chair at gpfsug.org>
To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony)
Message-ID: <54463882.7070009 at gpfsug.org>
Content-Type: text/plain; charset=windows-1252; format=flowed

I noticed that v7000 Unified is using CTDB v3.3.
What magic version is that as it's not in the git tree.  Latest tagged is 2.5.4.
Is that a question for Amitay?

On 17/10/14 06:25, Mcneil, Tony wrote:
> Hi Bill,
>
> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>
> Regards
> Tony.
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
> Sent: 16 October 2014 14:50
> To: gpfsug-discuss at gpfsug.org
> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>
> Are you using ctdb?
>
> Thanks,
> Bill Pappas -
> Manager - Enterprise Storage Group
> Sr. Enterprise Network Storage Architect Information Sciences 
> Department / Enterprise Informatics Division St. Jude Children's 
> Research Hospital
> 262 Danny Thomas Place, Mail Stop 504
> Memphis, TN 38105
> bill.pappas at stjude.org
> (901) 595-4549 office
> www.stjude.org
> Email disclaimer: http://www.stjude.org/emaildisclaimer
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of 
> gpfsug-discuss-request at gpfsug.org
> Sent: Thursday, October 16, 2014 6:00 AM
> To: gpfsug-discuss at gpfsug.org
> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>
> Send gpfsug-discuss mailing list submissions to
> 	gpfsug-discuss at gpfsug.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> or, via email, send a message with subject or body 'help' to
> 	gpfsug-discuss-request at gpfsug.org
>
> You can reach the person managing the list at
> 	gpfsug-discuss-owner at gpfsug.org
>
> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>
>
> Today's Topics:
>
>     1. Hello (Mcneil, Tony)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 15 Oct 2014 14:01:49 +0100
> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
> Subject: [gpfsug-discuss] Hello
> Message-ID:
> 	
> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.u
> k>
> 	
> Content-Type: text/plain; charset="us-ascii"
>
> Hello All,
>
> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>
> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>
> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>
> So far we have migrated all our students and approximately 60% of our staff.
>
> Looking forward to receiving some interesting posts from the forum.
>
> Regards
> Tony.
>
> Tony McNeil
> Senior Systems Support Analyst,  Infrastructure,   Information Services
> ______________________________________________________________________
> ________
>
> T   Internal: 62852
> T   020 8417 2852
>
> Kingston University London
> Penrhyn Road, Kingston upon Thames KT1 2EE 
> www.kingston.ac.uk<http://www.kingston.ac.uk>
>
> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
> Please consider the environment before printing this email.
>
>
> This email has been scanned for all viruses by the MessageLabs Email Security System.
> -------------- next part -------------- An HTML attachment was 
> scrubbed...
> URL: 
> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f528
> bcf/attachment-0001.html>
>
> ------------------------------
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> End of gpfsug-discuss Digest, Vol 33, Issue 19
> **********************************************
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> This email has been scanned for all viruses by the MessageLabs Email 
> Security System.
>
> This email has been scanned for all viruses by the MessageLabs Email 
> Security System.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


------------------------------

Message: 2
Date: Tue, 21 Oct 2014 10:53:37 +0000
From: Robert Triendl <rtriendl at ddn.com>
To: "chair at gpfsug.org" <chair at gpfsug.org>, gpfsug main discussion list
	<gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony)
Message-ID: <F7513C48-28BE-48FB-B44F-3CCF4F305DFE at ddn.com>
Content-Type: text/plain; charset="Windows-1252"

Yes, I think so? I am;-)

On 2014/10/21, at 19:42, Jez Tucker (Chair) <chair at gpfsug.org> wrote:

> I noticed that v7000 Unified is using CTDB v3.3.
> What magic version is that as it's not in the git tree.  Latest tagged is 2.5.4.
> Is that a question for Amitay?
> 
> On 17/10/14 06:25, Mcneil, Tony wrote:
>> Hi Bill,
>> 
>> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel
>> 
>> Regards
>> Tony.
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org 
>> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill
>> Sent: 16 October 2014 14:50
>> To: gpfsug-discuss at gpfsug.org
>> Subject: [gpfsug-discuss] Hello (Mcneil, Tony)
>> 
>> Are you using ctdb?
>> 
>> Thanks,
>> Bill Pappas -
>> Manager - Enterprise Storage Group
>> Sr. Enterprise Network Storage Architect Information Sciences 
>> Department / Enterprise Informatics Division St. Jude Children's 
>> Research Hospital
>> 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 
>> bill.pappas at stjude.org
>> (901) 595-4549 office
>> www.stjude.org
>> Email disclaimer: http://www.stjude.org/emaildisclaimer
>> 
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at gpfsug.org 
>> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of 
>> gpfsug-discuss-request at gpfsug.org
>> Sent: Thursday, October 16, 2014 6:00 AM
>> To: gpfsug-discuss at gpfsug.org
>> Subject: gpfsug-discuss Digest, Vol 33, Issue 19
>> 
>> Send gpfsug-discuss mailing list submissions to
>> 	gpfsug-discuss at gpfsug.org
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>> 	http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> or, via email, send a message with subject or body 'help' to
>> 	gpfsug-discuss-request at gpfsug.org
>> 
>> You can reach the person managing the list at
>> 	gpfsug-discuss-owner at gpfsug.org
>> 
>> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..."
>> 
>> 
>> Today's Topics:
>> 
>>    1. Hello (Mcneil, Tony)
>> 
>> 
>> ---------------------------------------------------------------------
>> -
>> 
>> Message: 1
>> Date: Wed, 15 Oct 2014 14:01:49 +0100
>> From: "Mcneil, Tony" <tmcneil at kingston.ac.uk>
>> To: "gpfsug-discuss at gpfsug.org" <gpfsug-discuss at gpfsug.org>
>> Subject: [gpfsug-discuss] Hello
>> Message-ID:
>> 	
>> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.
>> uk>
>> 	
>> Content-Type: text/plain; charset="us-ascii"
>> 
>> Hello All,
>> 
>> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team'
>> 
>> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site.
>> 
>> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data.
>> 
>> So far we have migrated all our students and approximately 60% of our staff.
>> 
>> Looking forward to receiving some interesting posts from the forum.
>> 
>> Regards
>> Tony.
>> 
>> Tony McNeil
>> Senior Systems Support Analyst,  Infrastructure,   Information Services
>> _____________________________________________________________________
>> _________
>> 
>> T   Internal: 62852
>> T   020 8417 2852
>> 
>> Kingston University London
>> Penrhyn Road, Kingston upon Thames KT1 2EE 
>> www.kingston.ac.uk<http://www.kingston.ac.uk>
>> 
>> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email.
>> Please consider the environment before printing this email.
>> 
>> 
>> This email has been scanned for all viruses by the MessageLabs Email Security System.
>> -------------- next part -------------- An HTML attachment was 
>> scrubbed...
>> URL: 
>> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20141015/8f52
>> 8bcf/attachment-0001.html>
>> 
>> ------------------------------
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> End of gpfsug-discuss Digest, Vol 33, Issue 19
>> **********************************************
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> This email has been scanned for all viruses by the MessageLabs Email 
>> Security System.
>> 
>> This email has been scanned for all viruses by the MessageLabs Email 
>> Security System.
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 33, Issue 21
**********************************************


From bbanister at jumptrading.com  Thu Oct 23 19:35:45 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 23 Oct 2014 18:35:45 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>

I reviewed my RFE request again and notice that it has been marked as ?Private? and I think this is preventing people from voting on this RFE.  I have talked to others that would like to vote for this RFE.

How can I set the RFE to public so that others may vote on it?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister
Sent: Friday, October 10, 2014 12:13 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted.  I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable.  I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!)

Thanks,
-Bryan

Ps. Hajo said that he couldn?t access the RFE to vote on it:

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."
Cheers
Hajo

Here is what the RFE website states:
Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
A unique URL that you can bookmark and share with others.

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Friday, October 10, 2014 11:52 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
I agree with Ben, I think.

I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM

To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/2647953e/attachment-0003.htm>

From bbanister at jumptrading.com  Thu Oct 23 19:50:21 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 23 Oct 2014 18:50:21 +0000
Subject: [gpfsug-discuss] GPFS User Group at SC14
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C68@CHI-EXCHANGEW2.w2k.jumptrading.com>

I'm going to be attending the GPFS User Group at SC14 this year.  Here is basic agenda that was provided:

GPFS/Elastic Storage User Group<http://www.ibm.com/marketing/campaigns/responses/servlet/IRSL?v=4&l=2&r=1552126&m=19222&p=t4AF1985E3251806FBB7C1E35C6B50F33B3C55757912C1492D293663A23F7665E328C51C1A1FF8D073BBA436369B63338&e=2>
Monday, November 17, 2014


3:00 PM-5:00 PM: GPFS/Elastic Storage User Group
[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

IBM Software Defined Storage strategy update

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Customer presentations

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Future directions such as object storage and OpenStack integration

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Elastic Storage server update

[http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif]

Elastic Storage roadmap (*NDA required)


5:00 PM: Reception

Conference room location provided upon registration. *Attendees must sign a non-disclosure agreement upon arrival or as provided in advance.


I think it would be great to review the submitted RFEs and give the user group the chance to vote on them to help promote the RFEs that we care about most.

I would also really appreciate any additional details regarding the new GPFS 4.1 deadlock detection facility and any recommended best practices around this new feature.

Thanks!
-Bryan

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/c6ed46c9/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 76 bytes
Desc: image001.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/c6ed46c9/attachment-0003.gif>

From chair at gpfsug.org  Thu Oct 23 19:52:07 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Thu, 23 Oct 2014 19:52:07 +0100
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>	<5437F562.1080609@psu.edu>	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>
Message-ID: <54494E57.90304@gpfsug.org>

Hi Bryan

   Unsure, to be honest.  When I added all the GPFS UG RFEs in, I didn't 
see an option to make the RFE private.
There's private fields, but not a 'make this RFE private' checkbox or such.

This one may be better directed to the GPFS developer forum / redo the RFE.

RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll 
be feeding info back to the group.

Jez

On 23/10/14 19:35, Bryan Banister wrote:
>
> I reviewed my RFE request again and notice that it has been marked as 
> ?Private? and I think this is preventing people from voting on this 
> RFE.  I have talked to others that would like to vote for this RFE.
>
> How can I set the RFE to public so that others may vote on it?
>
> Thanks!
>
> -Bryan
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Bryan Banister
> *Sent:* Friday, October 10, 2014 12:13 PM
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
> A DMAPI daemon solution puts a dependency on the DMAPI daemon for the 
> file system to be mounted.  I think it would be better to have 
> something like what I requested in the RFE that would hopefully not 
> have this dependency, and would be optional/configurable.  I?m sure we 
> would all prefer something that is supported directly by IBM (hence 
> the RFE!)
>
> Thanks,
>
> -Bryan
>
> Ps. Hajo said that he couldn?t access the RFE to vote on it:
>
> I would like to support the RFE but i get:
>
> "You cannot access this page because you do not have the proper 
> authority."
>
> Cheers
>
> Hajo
>
> Here is what the RFE website states:
>
> Bookmarkable 
> URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
> A unique URL that you can bookmark and share with others.
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org> 
> [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Sven Oehme
> *Sent:* Friday, October 10, 2014 11:52 AM
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
> The only DMAPI agent i am aware of is a prototype that was written by 
> tridge in 2008 to demonstrate a file based HSM system for GPFS.
>
> its a working prototype, at least it worked in 2008 :-)
>
> you can get the source code from git :
>
> http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary
>
> just to be clear, there is no Support for this code. we obviously 
> Support the DMAPI interface , but the code that exposes the API is 
> nothing we provide Support for.
>
> thx. Sven
>
> On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister 
> <bbanister at jumptrading.com <mailto:bbanister at jumptrading.com>> wrote:
>
> I agree with Ben, I think.
>
> I don?t want to use the ILM policy engine as that puts a direct 
> workload against the metadata storage and server resources.  We need 
> something out-of-band, out of the file system operational path.
>
> Is there a simple DMAPI daemon that would log the file system 
> namespace changes that we could use?
>
> If so are there any limitations?
>
> And is it possible to set this up in an HA environment?
>
> Thanks!
>
> -Bryan
>
> *From:*gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org> 
> [mailto:gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org>] *On Behalf Of *Ben De Luca
> *Sent:* Friday, October 10, 2014 11:10 AM
>
>
> *To:* gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion
>
> querying this through the policy engine is far to late to do any thing 
> useful with it
>
> On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com 
> <mailto:oehmes at gmail.com>> wrote:
>
> Ben,
>
> to get lists of 'Hot Files' turn File Heat on , some discussion about 
> it is here : 
> https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653
>
> thx.  Sven
>
> On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com 
> <mailto:bdeluca at gmail.com>> wrote:
>
> Id like this to see hot files
>
> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister 
> <bbanister at jumptrading.com <mailto:bbanister at jumptrading.com>> wrote:
>
> Hmm... I didn't think to use the DMAPI interface.  That could be a 
> nice option.  Has anybody done this already and are there any examples 
> we could look at?
>
> Thanks!
> -Bryan
>
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org> 
> [mailto:gpfsug-discuss-bounces at gpfsug.org 
> <mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
> Sent: Friday, October 10, 2014 10:04 AM
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] GPFS RFE promotion
>
> On 10/9/14 3:31 PM, Bryan Banister wrote:
> >
> > Just wanted to pass my GPFS RFE along:
> >
> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> > 0458
> >
> >
> > *Description*:
> >
> > GPFS File System Manager should provide the option to log all file and
> > directory operations that occur in a file system, preferably stored in
> > a TSD (Time Series Database) that could be quickly queried through an
> > API interface and command line tools.  ...
> >
>
> The rudimentaries for this already exist via the DMAPI interface in 
> GPFS (used by the TSM HSM product). A while ago this was posted to the 
> IBM GPFS DeveloperWorks forum:
>
> On 1/3/11 10:27 AM, dWForums wrote:
> > Author:
> > AlokK.Dhir
> >
> > Message:
> > We have a proof of concept which uses DMAPI to listens to and 
> passively logs filesystem changes with a non blocking listener.  This 
> log can be used to generate backup sets etc. Unfortunately, a bug in 
> the current DMAPI keeps this approach from working in the case of 
> certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly 
> share the code once it is working.
>
> -Phil
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ________________________________
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ------------------------------------------------------------------------
>
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> ------------------------------------------------------------------------
>
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> ------------------------------------------------------------------------
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/c2f15d0b/attachment-0003.htm>

From bbanister at jumptrading.com  Thu Oct 23 19:59:52 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Thu, 23 Oct 2014 18:59:52 +0000
Subject: [gpfsug-discuss] GPFS RFE promotion
In-Reply-To: <54494E57.90304@gpfsug.org>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<5437F562.1080609@psu.edu>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CAGC__Dg-qSWcsh3iugytnYheL9CS8Rs-4hmsrpzPE0c+idAsLA@mail.gmail.com>
	<CALssuR1oDvoYbB5JfT1ry0YxXWWahznppCEGjGW4XnG2Mh1PQw@mail.gmail.com>
	<CAGC__DjrKpbta9-9TZ84RqX9auv1Q+Tq4gRFpCFn5PLdRYuizw@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR1AuT7777fvk3d8c=6k_XNojvH69BEie_2kWiGwucapnA@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<54494E57.90304@gpfsug.org>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C98@CHI-EXCHANGEW2.w2k.jumptrading.com>

Looks like IBM decides if the RFE is public or private:

Q: What are private requests?
A: Private requests are requests that can be viewed only by IBM, the request author, members of a group with the request in its watchlist, and users with the request in their watchlist. Only the author of the request can add a private request to their watchlist or a group watchlist. Private requests appear in various public views, such as Top 20 watched or Planned requests; however, only limited information about the request will be displayed.
IBM determines the default request visibility of a request, either public or private, and IBM may change the request visibility at any time. If you are watching a request and have subscribed to email notifications, you will be notified if the visibility of the request changes.

I'm submitting a request to make the RFE public so that others may vote on it now,
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jez Tucker (Chair)
Sent: Thursday, October 23, 2014 1:52 PM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

Hi Bryan

  Unsure, to be honest.  When I added all the GPFS UG RFEs in, I didn't see an option to make the RFE private.
There's private fields, but not a 'make this RFE private' checkbox or such.

This one may be better directed to the GPFS developer forum / redo the RFE.

RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll be feeding info back to the group.

Jez
On 23/10/14 19:35, Bryan Banister wrote:
I reviewed my RFE request again and notice that it has been marked as "Private" and I think this is preventing people from voting on this RFE.  I have talked to others that would like to vote for this RFE.

How can I set the RFE to public so that others may vote on it?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister
Sent: Friday, October 10, 2014 12:13 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted.  I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable.  I'm sure we would all prefer something that is supported directly by IBM (hence the RFE!)

Thanks,
-Bryan

Ps. Hajo said that he couldn't access the RFE to vote on it:

I would like to support the RFE but i get:

"You cannot access this page because you do not have the proper authority."
Cheers
Hajo

Here is what the RFE website states:
Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458
A unique URL that you can bookmark and share with others.

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Friday, October 10, 2014 11:52 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS.

its a working prototype, at least it worked in 2008 :-)

you can get the source code from git :

http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary

just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for.

thx. Sven


On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
I agree with Ben, I think.

I don't want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources.  We need something out-of-band, out of the file system operational path.

Is there a simple DMAPI daemon that would log the file system namespace changes that we could use?

If so are there any limitations?

And is it possible to set this up in an HA environment?

Thanks!
-Bryan

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Ben De Luca
Sent: Friday, October 10, 2014 11:10 AM

To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

querying this through the policy engine is far to late to do any thing useful with it

On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme <oehmes at gmail.com<mailto:oehmes at gmail.com>> wrote:
Ben,

to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653

thx.  Sven


On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca <bdeluca at gmail.com<mailto:bdeluca at gmail.com>> wrote:
Id like this to see hot files

On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:
Hmm... I didn't think to use the DMAPI interface.  That could be a nice option.  Has anybody done this already and are there any examples we could look at?

Thanks!
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>] On Behalf Of Phil Pishioneri
Sent: Friday, October 10, 2014 10:04 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS RFE promotion

On 10/9/14 3:31 PM, Bryan Banister wrote:
>
> Just wanted to pass my GPFS RFE along:
>
> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6
> 0458
>
>
> *Description*:
>
> GPFS File System Manager should provide the option to log all file and
> directory operations that occur in a file system, preferably stored in
> a TSD (Time Series Database) that could be quickly queried through an
> API interface and command line tools.  ...
>

The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum:

On 1/3/11 10:27 AM, dWForums wrote:
> Author:
> AlokK.Dhir
>
> Message:
> We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener.  This log can be used to generate backup sets etc.  Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events.  I am told 3.4.0.3 may contain a fix.  We will gladly share the code once it is working.

-Phil
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141023/e4dbfbd9/attachment-0003.htm>

From bbanister at jumptrading.com  Fri Oct 24 19:58:07 2014
From: bbanister at jumptrading.com (Bryan Banister)
Date: Fri, 24 Oct 2014 18:58:07 +0000
Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment,
 slow prefetch operations
In-Reply-To: <OF63B850E2.11E81404-ON65257D6A.005322A1-65257D6A.00544690@in.ibm.com>
References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<CALssuR3Tzeze8M24q7F0SB=otktRQGxWJN73hS1kVwUQKHBkSg@mail.gmail.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OFF98EB873.816013A8-ON65257D6A.001A3A9D-65257D6A.001BC080@in.ibm.com>
	<21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com>
	<OF63B850E2.11E81404-ON65257D6A.005322A1-65257D6A.00544690@in.ibm.com>
Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB94C513@CHI-EXCHANGEW2.w2k.jumptrading.com>

It is with humble apology and great relief that I was wrong about the AFM limitation that I believed existed in the configuration I explained below.


The problem that I had with my configuration is that the NSD client cluster was not completely updated to GPFS 4.1.0-3, as there are a few nodes still running 3.5.0-20 in the cluster which currently prevents upgrading the GPFS file system release version (e.g. mmchconfig release=LATEST) to 4.1.0-3.  This GPFS configuration ?requirement? isn?t documented in the Advanced Admin Guide, but it makes sense that this is required since only the GPFS 4.1 release supports the GPFS protocol for AFM fileset targets.


I have tested the configuration with a new NSD Client cluster and the configuration works as desired.


Thanks Kalyan and others for their feedback.  Our file system namespace is unfortunately filled with small files that do not allow AFM to parallelize the data transfers across multiple nodes.  And unfortunately AFM will only allow one Gateway node per fileset to perform the prefetch namespace scan operation, which is incredibly slow as I stated before.  We were only seeing roughly 100 x " Queue numExec" operations per second.  I think this performance is gated by the directory namespace scan of the single gateway node.


Thanks!

-Bryan


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda
Sent: Tuesday, October 07, 2014 10:21 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations


some clarifications inline:


Regards

Kalyan

GPFS Development

EGL D Block, Bangalore


From:    Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>

To:          gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>

Date:     10/07/2014 08:12 PM

Subject:               Re: [gpfsug-discuss] AFM limitations in a multi-cluster

            environment, slow prefetch operations

Sent by:               gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>


Interesting that AFM is supposed to work in a multi-cluster environment.

We were using GPFS on the backend.  The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard

multi-cluster mount.   The "gateway" nodes in the new cluster mounted the

old file system.  All systems were connected over the same QDR IB fabric.

The client compute nodes in the third cluster mounted both the old and new file systems.  I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed.  I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable.  There were no error messages in any GPFS or system logs related to the `ls` fail.  NFS clients accessing cNFS servers in the new cluster also worked as expected.  The `ls` from the NFS client in an AFM fileset returned the expected directory listing.  Thus all symptoms indicated the configuration wasn't supported.  I may try to replicate the problem in a test environment at some point.


However AFM isn't really a great solution for file data migration between file systems for these reasons:

1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data).  No way to have it simply keep the two namespaces in sync.  And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system.  This is concerning and it is not easy to detect that this event has occurred.


--> The LU mode is meant for scenarios where changes in cache are not

--> meant

to be pushed back to old filesystem.  If thats not whats desired then other AFM modes like IW can be used to keep namespace in sync and data can flow from both sides.  Typically, for data migration --metadata-only to pull in the full namespace first and data can be migrated on demand or via policy as outlined above using prefetch cmd.  AFM setup should be extension to GPFS multi-cluster setup when using GPFS backend.


2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME!  And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES!

--> I am not sure I follow the first downtime.  If applications have to

start using the new filesystem, then they have to be informed accordingly.

If this can be done without bringing down applications, then there is no DOWNTIME.

Regarding, second downtime, you are right, disabling AFM after data migration requires unlink and hence downtime.  But there is a easy workaround, where revalidation intervals can be increased to max or GW nodes can be unconfigured without downtime with same effect.  And disabling AFM can be done at a later point during maintenance window.  We plan to modify this to have this done online aka without requiring unlink of the fileset.  This will get prioritized if there is enough interest in AFM being used in this direction.


3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration.  Multiple threads from a single node just doesn't cut it due to single node bandwidth limits.  When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second.  The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list!

--> Prefetch can run on multiple nodes by configuring multiple GW nodes

--> and

enabling parallel i/o as specified in the docs..link provided below.

Infact it can parallelize data xfer to a single file and also do multiple files in parallel depending on filesizes and various tuning params.


4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration.  Yes, there is of course efforts to breakup the namespace for each rsync operations.  This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired.


--> AFM can be used for data migration without any downtime dictated by

--> AFM

(see above) and it can infact use multiple threads on multiple nodes to do parallel i/o.


AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN.


Thanks,

-Bryan


-----Original Message-----

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda

Sent: Tuesday, October 07, 2014 12:03 AM

To: gpfsug main discussion list

Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations


Hi Bryan,

AFM supports GPFS multi-cluster..and we have customers already using this successfully.  Are you using GPFS backend?

Can you explain your configuration in detail and if ls is hung it would have generated some long waiters.  Maybe this should be pursued separately via PMR.  You can ping me the details directly if needed along with opening a PMR per IBM service process.


As for as prefetch is concerned, right now its limited to  one prefetch job per fileset.  Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration.

"afmNumFlushThreads" tunable controls the number of threads used by AFM.

This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.)


eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed.


List the change:

mmlsfileset fs1 prefetchIW --afm -L

Filesets in file system 'fs1':


Attributes for fileset prefetchIW:

===================================

Status                                  Linked

Path                                    /gpfs/fs1/prefetchIW

Id                                      36

afm-associated                          Yes

Target

nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch

Mode                                    independent-writer

File Lookup Refresh Interval            30 (default)

File Open Refresh Interval              30 (default)

Dir Lookup Refresh Interval             60 (default)

Dir Open Refresh Interval               60 (default)

Async Delay                             15 (default)

Last pSnapId                            0

Display Home Snapshots                  no

Number of Gateway Flush Threads         5

Prefetch Threshold                      0 (default)

Eviction Enabled                        yes (default)


AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm


and this link outlines tuning params for parallel i/o along with others:

http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning


Regards

Kalyan

GPFS Development

EGL D Block, Bangalore


From:   Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>

To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>

Date:   10/06/2014 09:57 PM

Subject:        Re: [gpfsug-discuss] AFM limitations in a multi-cluster

            environment, slow prefetch operations

Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>


We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan


From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme

Sent: Monday, October 06, 2014 11:28 AM

To: gpfsug main discussion list

Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations


Hi Bryan,


in 4.1 AFM uses multiple threads for reading data, this was different in

3.5 . what version are you using ?


thx. Sven


On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>

wrote:

Just an FYI to the GPFS user community,


We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems.  The two GPFS file systems are managed in two separate GPFS clusters.  We have a third GPFS cluster for compute systems.  We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system.  Unfortunately access to the AFM filesets from the compute cluster completely hang.  Access to the other parts of the second file system is fine.  This limitation/issue is not documented in the Advanced Admin Guide.


Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result.  According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset:

GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching:

v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset).


We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes.

However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation.


Cheers,

-Bryan


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________


Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at gpfsug.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141024/ff1481a0/attachment-0003.htm>

From chair at gpfsug.org  Wed Oct 29 13:59:40 2014
From: chair at gpfsug.org (Jez Tucker (Chair))
Date: Wed, 29 Oct 2014 13:59:40 +0000
Subject: [gpfsug-discuss] Storagebeers, Nov 13th
Message-ID: <5450F2CC.3070302@gpfsug.org>

Hello all,

   I just thought I'd make you all aware of a social, #storagebeers on 
Nov 13th organised by Martin Glassborow, one of our UG members.

http://www.gpfsug.org/2014/10/29/storagebeers-13th-nov/

I'll be popping along.  Hopefully see you there.

Jez


From Jared.Baker at uwyo.edu  Wed Oct 29 15:31:31 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 15:31:31 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
Message-ID: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/5e7d4cd0/attachment-0003.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 16:33:22 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 16:33:22 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <1414600402.24518.216.camel@buzzard.phy.strath.ac.uk>

On Wed, 2014-10-29 at 15:31 +0000, Jared David Baker wrote:

[SNIP]

> I?m wondering if somebody has seen this type of issue before? Will
> recreating my NSDs destroy the filesystem? I?m thinking that all the
> data is intact, but there is no crucial data on this file system yet,
> so I could recreate the file system, but I would like to learn how to
> solve a problem like this. Thanks for all help and information.
> 

At an educated guess and assuming the disks are visible to the OS (try
dd'ing the first few GB to /dev/null) it looks like you have managed at
some point to wipe the NSD descriptors from the disks - ouch.

The file system will continue to work after this has been done, but if
you start rebooting the NSD servers you will find after the last one has
been restarted the file system is unmountable. Simply unmounting the
file systems from each NDS server is also probably enough. For good
measure unless you have a backup of the NSD descriptors somewhere it is
also an unrecoverable condition.

Lucky for you if there is nothing on it that matters.

My suggestion is re-examine what you did during the firmware upgrade, as
that is the most likely culprit. However bear in mind that it could have
been days or even weeks ago that it occurred.

I would raise a PMR to be sure, but it looks to me like you will be
recreating the file system from scratch.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From oehmes at gmail.com  Wed Oct 29 16:42:26 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Wed, 29 Oct 2014 09:42:26 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>

Hello,

there are multiple reasons why the descriptors can not be found .

there was a recent change in firmware behaviors on multiple servers that
restore the GPT table from a disk if the disk was used as a OS disk before
used as GPFS disks.  some infos here :
https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e

if thats the case there is a procedure to restore them.

it could also be something very trivial , e.g. that your multipath mapping
changed and your nsddevice file actually just prints out devices instead of
scanning them and create a list on the fly , so GPFS ignores the new path
to the disks.
in any case , opening a PMR and work with Support is the best thing to do
before causing any more damage.
if the file-system is still mounted don't unmount it under any
circumstances as Support needs to extract NSD descriptor information from
it to restore them easily.

Sven


On Wed, Oct 29, 2014 at 8:31 AM, Jared David Baker <Jared.Baker at uwyo.edu>
wrote:

>  Hello all,
>
>
>
> I?m hoping that somebody can shed some light on a problem that I
> experienced yesterday. I?ve been working with GPFS for a couple months as
> an admin now, but I?ve come across a problem that I?m unable to see the
> answer to. Hopefully the solution is not listed somewhere blatantly on the
> web, but I spent a fair amount of time looking last night. Here is the
> situation: yesterday, I needed to update some firmware on a Mellanox HCA
> FDR14 card and reboot one of our GPFS servers and repeat for the sister
> node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However,
> upon reboot, the server seemed to lose the path mappings to the multipath
> devices for the NSDs. Output below:
>
>
>
> --
>
> [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch
>
>
>
> Disk name    NSD volume ID      Device         Node name
> Remarks
>
>
> ---------------------------------------------------------------------------------------
>
> dcs3800u31a_lun0 0A62001B54235577   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun0 0A62001B54235577   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini
>        (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -
> mminsd6.infini           (not found) server node
>
>
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> Also, the system was working fantastically before the reboot, but now I?m
> unable to mount the GPFS filesystem. The disk names look like they are
> there and mapped to the NSD volume ID, but there is no Device. I?ve created
> the /var/mmfs/etc/nsddevices script and it has the following output with
> user return 0:
>
>
>
> --
>
> [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
>
> mapper/dcs3800u31a_lun0 dmm
>
> mapper/dcs3800u31a_lun10 dmm
>
> mapper/dcs3800u31a_lun2 dmm
>
> mapper/dcs3800u31a_lun4 dmm
>
> mapper/dcs3800u31a_lun6 dmm
>
> mapper/dcs3800u31a_lun8 dmm
>
> mapper/dcs3800u31b_lun1 dmm
>
> mapper/dcs3800u31b_lun11 dmm
>
> mapper/dcs3800u31b_lun3 dmm
>
> mapper/dcs3800u31b_lun5 dmm
>
> mapper/dcs3800u31b_lun7 dmm
>
> mapper/dcs3800u31b_lun9 dmm
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> That output looks correct to me based on the documentation. So I went
> digging in the GPFS log file and found this relevant information:
>
>
>
> --
>
> Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No
> such NSD locally found.
>
> Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails.
> No such NSD locally found.
>
> Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails.
> No such NSD locally found.
>
> --
>
>
>
> Okay, so the NSDs don?t seem to be able to be found, so I attempt to
> rediscover the NSD by executing the command mmnsddiscover:
>
>
>
> --
>
> [root at mmmnsd5 ~]# mmnsddiscover
>
> mmnsddiscover:  Attempting to rediscover the disks.  This may take a while
> ...
>
> mmnsddiscover:  Finished.
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> I was hoping that finished, but then upon restarting GPFS, there was no
> success. Verifying with mmlsnsd -X -f gscratch
>
>
>
> --
>
> [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch
>
>
>
> Disk name    NSD volume ID      Device         Devtype  Node
> name                Remarks
>
>
> ---------------------------------------------------------------------------------------------------
>
> dcs3800u31a_lun0 0A62001B54235577   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun0 0A62001B54235577   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun10 0A62001C542355AA   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun2 0A62001C54235581   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun4 0A62001B5423558B   -              -
>    mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31a_lun6 0A62001C54235595   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31a_lun8 0A62001B5423559F   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun1 0A62001B5423557C   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun11 0A62001C542355AF   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun3 0A62001C54235586   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun5 0A62001B54235590   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -              -
> mminsd6.infini           (not found) server node
>
> dcs3800u31b_lun7 0A62001C5423559A   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -              -
> mminsd5.infini           (not found) server node
>
> dcs3800u31b_lun9 0A62001B542355A4   -              -
> mminsd6.infini           (not found) server node
>
>
>
> [root at mmmnsd5 ~]#
>
> --
>
>
>
> I?m wondering if somebody has seen this type of issue before? Will
> recreating my NSDs destroy the filesystem? I?m thinking that all the data
> is intact, but there is no crucial data on this file system yet, so I could
> recreate the file system, but I would like to learn how to solve a problem
> like this. Thanks for all help and information.
>
>
>
> Regards,
>
>
>
> Jared
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/be381140/attachment-0003.htm>

From oester at gmail.com  Wed Oct 29 16:46:35 2014
From: oester at gmail.com (Bob Oesterlin)
Date: Wed, 29 Oct 2014 11:46:35 -0500
Subject: [gpfsug-discuss] GPFS 4.1 event "deadlockOverload"
Message-ID: <CAMNdFvA9XpNVGGM9=BefO9x6rjkm9tGoD6jeG759H1Whd=4f9w@mail.gmail.com>

I posted this to developerworks, but haven't seen a response. This is NOT
the same event "deadlockDetected" that is documented in the 4.1 Probelm
Determination Guide.

I see these errors -in my mmfslog on the cluster master. I just upgraded to
4.1, and I can't find this documented anywhere. What is "event
deadlockOverload" ? And what script would it call?


The nodes in question are part of a CNFS group.


Mon Oct 27 10:11:08.848 2014: [I] Received overload notification request
from 10.30.42.30 to forward to all nodes in cluster XXX
Mon Oct 27 10:11:08.849 2014: [I] Calling User Exit Script
gpfsNotifyOverload: event deadlockOverload, Async command
/usr/lpp/mmfs/bin/mmcommon.
Mon Oct 27 10:11:14.478 2014: [I] Received overload notification request
from 10.30.42.26 to forward to all nodes in cluster XXX
Mon Oct 27 10:11:58.869 2014: [I] Received overload notification request
from 10.30.42.30 to forward to all nodes in cluster XXX
Mon Oct 27 10:11:58.870 2014: [I] Calling User Exit Script
gpfsNotifyOverload: event deadlockOverload, Async command
/usr/lpp/mmfs/bin/mmcommon.

Bob Oesterlin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/40209cd2/attachment-0003.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 17:19:14 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 17:19:14 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
Message-ID: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>

On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> Hello,
> 
> 
> there are multiple reasons why the descriptors can not be found .
> 
> 
> there was a recent change in firmware behaviors on multiple servers
> that restore the GPT table from a disk if the disk was used as a OS
> disk before used as GPFS disks.  some infos
> here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e
> 
> 
> if thats the case there is a procedure to restore them.

I have been categorically told by IBM in no uncertain terms if the NSD
descriptors have *ALL* been wiped then it is game over for that file
system; restore from backup is your only option.

If the GPT table has been "restored" and overwritten the NSD descriptors
then you are hosed.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From oehmes at gmail.com  Wed Oct 29 17:22:30 2014
From: oehmes at gmail.com (Sven Oehme)
Date: Wed, 29 Oct 2014 10:22:30 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
Message-ID: <CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>

if you still have a running system you can extract the information and
recreate the descriptors.
if your sytem is already down, this is not possible any more.

which is why i suggested to open a PMR as the Support team will be able to
provide the right guidance and help .

Sven

On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard <jonathan at buzzard.me.uk>
wrote:

> On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> > Hello,
> >
> >
> > there are multiple reasons why the descriptors can not be found .
> >
> >
> > there was a recent change in firmware behaviors on multiple servers
> > that restore the GPT table from a disk if the disk was used as a OS
> > disk before used as GPFS disks.  some infos
> > here :
> https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e
> >
> >
> > if thats the case there is a procedure to restore them.
>
> I have been categorically told by IBM in no uncertain terms if the NSD
> descriptors have *ALL* been wiped then it is game over for that file
> system; restore from backup is your only option.
>
> If the GPT table has been "restored" and overwritten the NSD descriptors
> then you are hosed.
>
> JAB.
>
> --
> Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
> Fife, United Kingdom.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/98e54436/attachment-0003.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 17:29:09 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 17:29:09 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
Message-ID: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk>

On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote:
> if you still have a running system you can extract the information and
> recreate the descriptors. 

We had a running system with the file system still mounted on some nodes
but all the NSD descriptors wiped, and I repeat where categorically told
by IBM that nothing could be done and to restore the file system from
backup.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From Jared.Baker at uwyo.edu  Wed Oct 29 17:30:00 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 17:30:00 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
Message-ID: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com>

Thanks for all the information. I?m not exactly sure what happened during the firmware update of the HCAs (another admin). But I do have all the stanza files that I used to create the NSDs. Possible to utilize them to just regenerate the NSDs or is it consensus that the FS is gone? As the system was not in production (yet) I?ve got no problem delaying the release and running some tests to verify possible fixes. The system was already unmounted, so it is a completely inactive FS across the cluster.

Thanks,

Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 11:23 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

if you still have a running system you can extract the information and recreate the descriptors.
if your sytem is already down, this is not possible any more.

which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help .

Sven

On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard <jonathan at buzzard.me.uk<mailto:jonathan at buzzard.me.uk>> wrote:
On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> Hello,
>
>
> there are multiple reasons why the descriptors can not be found .
>
>
> there was a recent change in firmware behaviors on multiple servers
> that restore the GPT table from a disk if the disk was used as a OS
> disk before used as GPFS disks.  some infos
> here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e
>
>
> if thats the case there is a procedure to restore them.

I have been categorically told by IBM in no uncertain terms if the NSD
descriptors have *ALL* been wiped then it is game over for that file
system; restore from backup is your only option.

If the GPT table has been "restored" and overwritten the NSD descriptors
then you are hosed.

JAB.

--
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk<http://buzzard.me.uk>
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/75266e11/attachment-0003.htm>

From oehmes at us.ibm.com  Wed Oct 29 17:45:38 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 10:45:38 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>
	<4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF4F2A0B39.CDE2633C-ON88257D80.0060D847-88257D80.00618FB5@us.ibm.com>

Jared,

if time permits i would open a PMR to check what happened. as i stated in 
my first email it could be multiple things, the GPT restore is only one 
possible of many explanations and some more simple reasons could explain 
what you see as well. get somebody from support check the state and then 
we know for sure. it would give you also peace of mind that it doesn't 
happen again when you are in production.
if you feel its not worth and you don't wipe any important information 
start over again.

btw. the newer BIOS versions of IBM servers have a option from preventing 
the GPT issue from happening : 

[root at gss02n1 ~]# asu64 showvalues DiskGPTRecovery.DiskGPTRecovery
IBM Advanced Settings Utility version 9.61.85B
Licensed Materials - Property of IBM
(C) Copyright IBM Corp. 2007-2014 All Rights Reserved
IMM LAN-over-USB device 0 enabled successfully.
Successfully discovered the IMM via SLP.
Discovered IMM at IP address 169.254.95.118
Connected to IMM at IP address 169.254.95.118
DiskGPTRecovery.DiskGPTRecovery=None=<Automatic>

if you set it the GPT will never get restored. you would have to set this 
on all the nodes that have access to the disks.

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 10:30 AM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Thanks for all the information. I?m not exactly sure what happened during 
the firmware update of the HCAs (another admin). But I do have all the 
stanza files that I used to create the NSDs. Possible to utilize them to 
just regenerate the NSDs or is it consensus that the FS is gone? As the 
system was not in production (yet) I?ve got no problem delaying the 
release and running some tests to verify possible fixes. The system was 
already unmounted, so it is a completely inactive FS across the cluster.
 
Thanks,
 
Jared
 
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 11:23 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings
 
if you still have a running system you can extract the information and 
recreate the descriptors. 
if your sytem is already down, this is not possible any more. 
 
which is why i suggested to open a PMR as the Support team will be able to 
provide the right guidance and help . 
 
Sven
 
On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard <jonathan at buzzard.me.uk
> wrote:
On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote:
> Hello,
>
>
> there are multiple reasons why the descriptors can not be found .
>
>
> there was a recent change in firmware behaviors on multiple servers
> that restore the GPT table from a disk if the disk was used as a OS
> disk before used as GPFS disks.  some infos
> here : 
https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e

>
>
> if thats the case there is a procedure to restore them.

I have been categorically told by IBM in no uncertain terms if the NSD
descriptors have *ALL* been wiped then it is game over for that file
system; restore from backup is your only option.

If the GPT table has been "restored" and overwritten the NSD descriptors
then you are hosed.

JAB.

--
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/3f10207c/attachment-0003.htm>

From ewahl at osc.edu  Wed Oct 29 18:57:28 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Wed, 29 Oct 2014 18:57:28 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<CALssuR1rspXqwpAutzDUZAgsSm57GH+bAckRYekJ2sTU2uCVXw@mail.gmail.com>
	<1414603154.24518.225.camel@buzzard.phy.strath.ac.uk>
	<CALssuR3Pz=B1SDXaKZUeQQ_JO3pU4y15EEbcK_N_Y1RwBvK+hg@mail.gmail.com>,
	<1414603749.24518.227.camel@buzzard.phy.strath.ac.uk>
Message-ID: <C59E5201836F7147BAD35189FFBB35D10116515C18@USOAPP09V04P.si.lan>

SOBAR is your friend at that point?

Ed Wahl
OSC

 
________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jonathan Buzzard [jonathan at buzzard.me.uk]
Sent: Wednesday, October 29, 2014 1:29 PM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote:
> if you still have a running system you can extract the information and
> recreate the descriptors.

We had a running system with the file system still mounted on some nodes
but all the NSD descriptors wiped, and I repeat where categorically told
by IBM that nothing could be done and to restore the file system from
backup.

JAB.

--
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From ewahl at osc.edu  Wed Oct 29 19:07:34 2014
From: ewahl at osc.edu (Ed Wahl)
Date: Wed, 29 Oct 2014 19:07:34 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>

  Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.  
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I?m hoping that somebody can shed some light on a problem that I experienced yesterday. I?ve been working with GPFS for a couple months as an admin now, but I?ve come across a problem that I?m unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I?m unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I?ve created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don?t seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I?m wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I?m thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared


From Jared.Baker at uwyo.edu  Wed Oct 29 19:27:26 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 19:27:26 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
Message-ID: <e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>

Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

   8       48 31251951616 sdd
   8       32 31251951616 sdc
   8       80 31251951616 sdf
   8       16 31251951616 sdb
   8      128 31251951616 sdi
   8      112 31251951616 sdh
   8       96 31251951616 sdg
   8      192 31251951616 sdm
   8      240 31251951616 sdp
   8      208 31251951616 sdn
   8      144 31251951616 sdj
   8       64 31251951616 sde
   8      224 31251951616 sdo
   8      160 31251951616 sdk
   8      176 31251951616 sdl
  65        0 31251951616 sdq
  65       48 31251951616 sdt
  65       16 31251951616 sdr
  65      128  584960000 sdy
  65       80 31251951616 sdv
  65       96 31251951616 sdw
  65       64 31251951616 sdu
  65      112 31251951616 sdx
  65       32 31251951616 sds
   8        0 31251951616 sda
 253        0 31251951616 dm-0
 253        1 31251951616 dm-1
 253        2 31251951616 dm-2
 253        3 31251951616 dm-3
 253        4 31251951616 dm-4
 253        5 31251951616 dm-5
 253        6 31251951616 dm-6
 253        7 31251951616 dm-7
 253        8 31251951616 dm-8
 253        9 31251951616 dm-9
 253       10 31251951616 dm-10
 253       11 31251951616 dm-11
 253       12  584960000 dm-12
 253       13     524288 dm-13
 253       14   16777216 dm-14
 253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

  Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.  
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From oehmes at us.ibm.com  Wed Oct 29 19:41:22 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 12:41:22 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>

can you please post the content of your nsddevices script ? 

also please run 

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is 
all SAS attached. Two servers which can see the multipath LUNS for 
failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

   8       48 31251951616 sdd
   8       32 31251951616 sdc
   8       80 31251951616 sdf
   8       16 31251951616 sdb
   8      128 31251951616 sdi
   8      112 31251951616 sdh
   8       96 31251951616 sdg
   8      192 31251951616 sdm
   8      240 31251951616 sdp
   8      208 31251951616 sdn
   8      144 31251951616 sdj
   8       64 31251951616 sde
   8      224 31251951616 sdo
   8      160 31251951616 sdk
   8      176 31251951616 sdl
  65        0 31251951616 sdq
  65       48 31251951616 sdt
  65       16 31251951616 sdr
  65      128  584960000 sdy
  65       80 31251951616 sdv
  65       96 31251951616 sdw
  65       64 31251951616 sdu
  65      112 31251951616 sdx
  65       32 31251951616 sds
   8        0 31251951616 sda
 253        0 31251951616 dm-0
 253        1 31251951616 dm-1
 253        2 31251951616 dm-2
 253        3 31251951616 dm-3
 253        4 31251951616 dm-4
 253        5 31251951616 dm-5
 253        6 31251951616 dm-6
 253        7 31251951616 dm-7
 253        8 31251951616 dm-8
 253        9 31251951616 dm-9
 253       10 31251951616 dm-10
 253       11 31251951616 dm-11
 253       12  584960000 dm-12
 253       13     524288 dm-13
 253       14   16777216 dm-14
 253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

  Can you see the block devices from inside the OS after the reboot?  I 
don't see where you mention this.  How is the storage attached to the 
server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes 
share the storage?  All nsds in same failure group?     I was quickly 
brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly 
updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one 
day, dm-something else another), so that is no problem.  But wacking the 
volume label is a pain. 
When hardware dies if you have nsds sharing the same LUNs you can just 
transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org 
[gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker 
[Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I 
experienced yesterday. I've been working with GPFS for a couple months as 
an admin now, but I've come across a problem that I'm unable to see the 
answer to. Hopefully the solution is not listed somewhere blatantly on the 
web, but I spent a fair amount of time looking last night. Here is the 
situation: yesterday, I needed to update some firmware on a Mellanox HCA 
FDR14 card and reboot one of our GPFS servers and repeat for the sister 
node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, 
upon reboot, the server seemed to lose the path mappings to the multipath 
devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini   (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini   (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini   (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini   (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini  (not 
found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm 
unable to mount the GPFS filesystem. The disk names look like they are 
there and mapped to the NSD volume ID, but there is no Device. I've 
created the /var/mmfs/etc/nsddevices script and it has the following 
output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went 
digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. 
No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. 
No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to 
rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while 
...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no 
success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name  Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd5.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd5.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini 
          (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will 
recreating my NSDs destroy the filesystem? I'm thinking that all the data 
is intact, but there is no crucial data on this file system yet, so I 
could recreate the file system, but I would like to learn how to solve a 
problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/8b616a16/attachment-0003.htm>

From Jared.Baker at uwyo.edu  Wed Oct 29 19:46:23 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 19:46:23 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
Message-ID: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>

Sven, output below:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--

Thanks, Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

can you please post the content of your nsddevices script ?

also please run

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
 `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

  8       48 31251951616 sdd
  8       32 31251951616 sdc
  8       80 31251951616 sdf
  8       16 31251951616 sdb
  8      128 31251951616 sdi
  8      112 31251951616 sdh
  8       96 31251951616 sdg
  8      192 31251951616 sdm
  8      240 31251951616 sdp
  8      208 31251951616 sdn
  8      144 31251951616 sdj
  8       64 31251951616 sde
  8      224 31251951616 sdo
  8      160 31251951616 sdk
  8      176 31251951616 sdl
 65        0 31251951616 sdq
 65       48 31251951616 sdt
 65       16 31251951616 sdr
 65      128  584960000 sdy
 65       80 31251951616 sdv
 65       96 31251951616 sdw
 65       64 31251951616 sdu
 65      112 31251951616 sdx
 65       32 31251951616 sds
  8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

 Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/636898cf/attachment-0003.htm>

From oehmes at us.ibm.com  Wed Oct 29 20:02:53 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 13:02:53 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>

Hi,

i was asking for the content, not the result :-)

can you run cat /var/mmfs/etc/nsddevices

the 2nd output confirms at least that there is no correct label on the 
disk, as it returns EFI 

on a GNR system you get a few more infos , but at least you should see the 
NSD descriptor string like i get on my system : 


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings
T7$V
e2d2s08
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s

while i still would like to see the nsddevices script i assume your NSD 
descriptor is wiped and without a lot of manual labor and at least a 
recent GPFS dump this is very hard if at all to recreate.


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 12:46 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Sven, output below:
 
--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--
 
Thanks, Jared
 
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings
 
can you please post the content of your nsddevices script ? 

also please run   

dd if=/dev/dm-0 bs=1k count=32 |strings 

and post the output 

thx. Sven 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Jared David Baker <Jared.Baker at uwyo.edu> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/29/2014 12:27 PM 
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is 
all SAS attached. Two servers which can see the multipath LUNS for 
failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
 `- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
 `- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

  8       48 31251951616 sdd
  8       32 31251951616 sdc
  8       80 31251951616 sdf
  8       16 31251951616 sdb
  8      128 31251951616 sdi
  8      112 31251951616 sdh
  8       96 31251951616 sdg
  8      192 31251951616 sdm
  8      240 31251951616 sdp
  8      208 31251951616 sdn
  8      144 31251951616 sdj
  8       64 31251951616 sde
  8      224 31251951616 sdo
  8      160 31251951616 sdk
  8      176 31251951616 sdl
 65        0 31251951616 sdq
 65       48 31251951616 sdt
 65       16 31251951616 sdr
 65      128  584960000 sdy
 65       80 31251951616 sdv
 65       96 31251951616 sdw
 65       64 31251951616 sdu
 65      112 31251951616 sdx
 65       32 31251951616 sds
  8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

 Can you see the block devices from inside the OS after the reboot?  I 
don't see where you mention this.  How is the storage attached to the 
server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes 
share the storage?  All nsds in same failure group?     I was quickly 
brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly 
updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one 
day, dm-something else another), so that is no problem.  But wacking the 
volume label is a pain. 
When hardware dies if you have nsds sharing the same LUNs you can just 
transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org 
[gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker 
[Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I 
experienced yesterday. I've been working with GPFS for a couple months as 
an admin now, but I've come across a problem that I'm unable to see the 
answer to. Hopefully the solution is not listed somewhere blatantly on the 
web, but I spent a fair amount of time looking last night. Here is the 
situation: yesterday, I needed to update some firmware on a Mellanox HCA 
FDR14 card and reboot one of our GPFS servers and repeat for the sister 
node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, 
upon reboot, the server seemed to lose the path mappings to the multipath 
devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini   (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini   (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini   (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini   (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini  (not 
found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm 
unable to mount the GPFS filesystem. The disk names look like they are 
there and mapped to the NSD volume ID, but there is no Device. I've 
created the /var/mmfs/etc/nsddevices script and it has the following 
output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went 
digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. 
No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. 
No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to 
rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while 
...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no 
success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name  Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd5.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd5.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini 
          (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will 
recreating my NSDs destroy the filesystem? I'm thinking that all the data 
is intact, but there is no crucial data on this file system yet, so I 
could recreate the file system, but I would like to learn how to solve a 
problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/52ddf40d/attachment-0003.htm>

From Jared.Baker at uwyo.edu  Wed Oct 29 20:13:06 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 20:13:06 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
Message-ID: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>

Apologies Sven, w/o comments below:

--
#!/bin/ksh
CONTROLLER_REGEX='[ab]_lun[0-9]+'
for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX )
do
   echo mapper/$dev dmm
   #echo mapper/$dev generic
done

# Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover),
return 0
--

Best, Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:03 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Hi,

i was asking for the content, not the result :-)

can you run cat /var/mmfs/etc/nsddevices

the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI

on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system :


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings
T7$V
e2d2s08
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s

while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate.


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:46 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Sven, output below:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--

Thanks, Jared

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

can you please post the content of your nsddevices script ?

also please run

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

 8       48 31251951616 sdd
 8       32 31251951616 sdc
 8       80 31251951616 sdf
 8       16 31251951616 sdb
 8      128 31251951616 sdi
 8      112 31251951616 sdh
 8       96 31251951616 sdg
 8      192 31251951616 sdm
 8      240 31251951616 sdp
 8      208 31251951616 sdn
 8      144 31251951616 sdj
 8       64 31251951616 sde
 8      224 31251951616 sdo
 8      160 31251951616 sdk
 8      176 31251951616 sdl
65        0 31251951616 sdq
65       48 31251951616 sdt
65       16 31251951616 sdr
65      128  584960000 sdy
65       80 31251951616 sdv
65       96 31251951616 sdw
65       64 31251951616 sdu
65      112 31251951616 sdx
65       32 31251951616 sds
 8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/fc35facb/attachment-0003.htm>

From oehmes at us.ibm.com  Wed Oct 29 20:25:10 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Wed, 29 Oct 2014 13:25:10 -0700
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>

Hi,

based on what i see is your BIOS or FW update wiped the NSD descriptor by 
restoring a GPT table on the start of a disk that shouldn't have a GPT 
table to begin with as its under control of GPFS.
future releases of GPFS prevent this by writing our own GPT label to the 
disks so other tools don't touch them, but that doesn't help in your case 
any more. if you want this officially confirmed i would still open a PMR, 
but at that point given that you don't seem to have any production data on 
it from what i see in your response you should recreate the filesystem.

Sven

------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   Jared David Baker <Jared.Baker at uwyo.edu>
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/29/2014 01:13 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Apologies Sven, w/o comments below:
 
--
#!/bin/ksh
CONTROLLER_REGEX='[ab]_lun[0-9]+'
for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX )
do
   echo mapper/$dev dmm
   #echo mapper/$dev generic
done
 
# Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover),
return 0
--
 
Best, Jared
 
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:03 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings
 
Hi, 

i was asking for the content, not the result :-) 

can you run cat /var/mmfs/etc/nsddevices 

the 2nd output confirms at least that there is no correct label on the 
disk, as it returns EFI 

on a GNR system you get a few more infos , but at least you should see the 
NSD descriptor string like i get on my system : 


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings 
T7$V 
e2d2s08 
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014 
32+0 records in 
32+0 records out 
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s 

while i still would like to see the nsddevices script i assume your NSD 
descriptor is wiped and without a lot of manual labor and at least a 
recent GPFS dump this is very hard if at all to recreate. 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Jared David Baker <Jared.Baker at uwyo.edu> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/29/2014 12:46 PM 
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


Sven, output below: 
  
-- 
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices 
mapper/dcs3800u31a_lun0 dmm 
mapper/dcs3800u31a_lun10 dmm 
mapper/dcs3800u31a_lun2 dmm 
mapper/dcs3800u31a_lun4 dmm 
mapper/dcs3800u31a_lun6 dmm 
mapper/dcs3800u31a_lun8 dmm 
mapper/dcs3800u31b_lun1 dmm 
mapper/dcs3800u31b_lun11 dmm 
mapper/dcs3800u31b_lun3 dmm 
mapper/dcs3800u31b_lun5 dmm 
mapper/dcs3800u31b_lun7 dmm 
mapper/dcs3800u31b_lun9 dmm 
[root at mmmnsd5 ~]# 
-- 
-- 
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 
32+0 records in 
32+0 records out 
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s 
EFI PART 
system 
[root at mmmnsd5 /]# 
-- 
  
Thanks, Jared 
  
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings 
  
can you please post the content of your nsddevices script ? 

also please run   

dd if=/dev/dm-0 bs=1k count=32 |strings 

and post the output 

thx. Sven 


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------ 


From:        Jared David Baker <Jared.Baker at uwyo.edu> 
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org> 
Date:        10/29/2014 12:27 PM 
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings 
Sent by:        gpfsug-discuss-bounces at gpfsug.org 


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is 
all SAS attached. Two servers which can see the multipath LUNS for 
failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 
rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

 8       48 31251951616 sdd
 8       32 31251951616 sdc
 8       80 31251951616 sdf
 8       16 31251951616 sdb
 8      128 31251951616 sdi
 8      112 31251951616 sdh
 8       96 31251951616 sdg
 8      192 31251951616 sdm
 8      240 31251951616 sdp
 8      208 31251951616 sdn
 8      144 31251951616 sdj
 8       64 31251951616 sde
 8      224 31251951616 sdo
 8      160 31251951616 sdk
 8      176 31251951616 sdl
65        0 31251951616 sdq
65       48 31251951616 sdt
65       16 31251951616 sdr
65      128  584960000 sdy
65       80 31251951616 sdv
65       96 31251951616 sdw
65       64 31251951616 sdu
65      112 31251951616 sdx
65       32 31251951616 sds
 8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [
mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Can you see the block devices from inside the OS after the reboot?  I 
don't see where you mention this.  How is the storage attached to the 
server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes 
share the storage?  All nsds in same failure group?     I was quickly 
brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly 
updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one 
day, dm-something else another), so that is no problem.  But wacking the 
volume label is a pain. 
When hardware dies if you have nsds sharing the same LUNs you can just 
transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org 
[gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker 
[Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I 
experienced yesterday. I've been working with GPFS for a couple months as 
an admin now, but I've come across a problem that I'm unable to see the 
answer to. Hopefully the solution is not listed somewhere blatantly on the 
web, but I spent a fair amount of time looking last night. Here is the 
situation: yesterday, I needed to update some firmware on a Mellanox HCA 
FDR14 card and reboot one of our GPFS servers and repeat for the sister 
node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, 
upon reboot, the server seemed to lose the path mappings to the multipath 
devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini   (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini   (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini  (not 
found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini  (not 
found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini   (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini   (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini  (not 
found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini  (not 
found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini  (not 
found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm 
unable to mount the GPFS filesystem. The disk names look like they are 
there and mapped to the NSD volume ID, but there is no Device. I've 
created the /var/mmfs/etc/nsddevices script and it has the following 
output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went 
digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No 
such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No 
such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. 
No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. 
No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to 
rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while 
...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no 
success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name  Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd6.infini  (not 
found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              - mminsd5.infini  (not 
found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd6.infini  (not 
found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              - mminsd5.infini  (not 
found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini 
          (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini 
          (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini 
          (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will 
recreating my NSDs destroy the filesystem? I'm thinking that all the data 
is intact, but there is no crucial data on this file system yet, so I 
could recreate the file system, but I would like to learn how to solve a 
problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/70ca7229/attachment-0003.htm>

From Jared.Baker at uwyo.edu  Wed Oct 29 20:30:29 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 20:30:29 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
Message-ID: <fdd5ef1e6e4d4444a49655c2f28d2f09@CY1PR0501MB1259.namprd05.prod.outlook.com>

Thanks Sven, I appreciate the feedback. I'll be opening the PMR soon.

Again, thanks for the information.

Best, Jared

From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:25 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Hi,

based on what i see is your BIOS or FW update wiped the NSD descriptor by restoring a GPT table on the start of a disk that shouldn't have a GPT table to begin with as its under control of GPFS.
future releases of GPFS prevent this by writing our own GPT label to the disks so other tools don't touch them, but that doesn't help in your case any more. if you want this officially confirmed i would still open a PMR, but at that point given that you don't seem to have any production data on it from what i see in your response you should recreate the filesystem.

Sven

------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 01:13 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Apologies Sven, w/o comments below:

--
#!/bin/ksh
CONTROLLER_REGEX='[ab]_lun[0-9]+'
for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX )
do
   echo mapper/$dev dmm
   #echo mapper/$dev generic
done

# Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover),
return 0
--

Best, Jared

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 2:03 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Hi,

i was asking for the content, not the result :-)

can you run cat /var/mmfs/etc/nsddevices

the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI

on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system :


[root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings
T7$V
e2d2s08
NSD descriptor for /dev/sdde created by GPFS Thu Oct  9 16:48:27 2014
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s

while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate.


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:46 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Sven, output below:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--
--
[root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings
32+0 records in
32+0 records out
32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s
EFI PART
system
[root at mmmnsd5 /]#
--

Thanks, Jared

From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme
Sent: Wednesday, October 29, 2014 1:41 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

can you please post the content of your nsddevices script ?

also please run

dd if=/dev/dm-0 bs=1k count=32 |strings

and post the output

thx. Sven


------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com<mailto:oehmes at us.ibm.com>
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------


From:        Jared David Baker <Jared.Baker at uwyo.edu<mailto:Jared.Baker at uwyo.edu>>
To:        gpfsug main discussion list <gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>>
Date:        10/29/2014 12:27 PM
Subject:        Re: [gpfsug-discuss] Server lost NSD mappings
Sent by:        gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org>
________________________________


Thanks Ed,

I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster.

--
[root at mmmnsd5 ~]# multipath -l
dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:8  sdi 8:128  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:8  sdu 65:64  active undef running
dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:9  sdv 65:80  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:9  sdj 8:144  active undef running
dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:6  sdg 8:96   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:6  sds 65:32  active undef running
mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115
size=558G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- 1:2:0:0  sdy 65:128 active undef running
dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:7  sdt 65:48  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:7  sdh 8:112  active undef running
dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:10 sdk 8:160  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:10 sdw 65:96  active undef running
dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:4  sde 8:64   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:4  sdq 65:0   active undef running
dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:5  sdr 65:16  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:5  sdf 8:80   active undef running
dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:2  sdc 8:32   active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:2  sdo 8:224  active undef running
dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:11 sdx 65:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:11 sdl 8:176  active undef running
dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:3  sdp 8:240  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:3  sdd 8:48   active undef running
dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:0:0  sda 8:0    active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:1:0  sdm 8:192  active undef running
dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813      FAStT
size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| `- 0:0:1:1  sdn 8:208  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
`- 0:0:0:1  sdb 8:16   active undef running
[root at mmmnsd5 ~]#
--

--
[root at mmmnsd5 ~]# cat /proc/partitions
major minor  #blocks  name

8       48 31251951616 sdd
8       32 31251951616 sdc
8       80 31251951616 sdf
8       16 31251951616 sdb
8      128 31251951616 sdi
8      112 31251951616 sdh
8       96 31251951616 sdg
8      192 31251951616 sdm
8      240 31251951616 sdp
8      208 31251951616 sdn
8      144 31251951616 sdj
8       64 31251951616 sde
8      224 31251951616 sdo
8      160 31251951616 sdk
8      176 31251951616 sdl
65        0 31251951616 sdq
65       48 31251951616 sdt
65       16 31251951616 sdr
65      128  584960000 sdy
65       80 31251951616 sdv
65       96 31251951616 sdw
65       64 31251951616 sdu
65      112 31251951616 sdx
65       32 31251951616 sds
8        0 31251951616 sda
253        0 31251951616 dm-0
253        1 31251951616 dm-1
253        2 31251951616 dm-2
253        3 31251951616 dm-3
253        4 31251951616 dm-4
253        5 31251951616 dm-5
253        6 31251951616 dm-6
253        7 31251951616 dm-7
253        8 31251951616 dm-8
253        9 31251951616 dm-9
253       10 31251951616 dm-10
253       11 31251951616 dm-11
253       12  584960000 dm-12
253       13     524288 dm-13
253       14   16777216 dm-14
253       15  567657472 dm-15
[root at mmmnsd5 ~]#
--

The NSDs had no failure group defined on creation.

Regards,

Jared


-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl
Sent: Wednesday, October 29, 2014 1:08 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

Can you see the block devices from inside the OS after the reboot?  I don't see where you mention this.  How is the storage attached to the server?  As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage?  All nsds in same failure group?     I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices.


cat /proc/partitions ?
multipath -l ?


Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem.  But wacking the volume label is a pain.
When hardware dies if you have nsds sharing the same LUNs you can just transfer  /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle.

Ed Wahl
OSC


________________________________________
From: gpfsug-discuss-bounces at gpfsug.org<mailto:gpfsug-discuss-bounces at gpfsug.org> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu]
Sent: Wednesday, October 29, 2014 11:31 AM
To: gpfsug-discuss at gpfsug.org<mailto:gpfsug-discuss at gpfsug.org>
Subject: [gpfsug-discuss] Server lost NSD mappings

Hello all,

I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below:

--
[root at mmmnsd5 ~]# mmlsnsd -m -f gscratch

Disk name    NSD volume ID      Device         Node name                Remarks
---------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0:

--
[root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices
mapper/dcs3800u31a_lun0 dmm
mapper/dcs3800u31a_lun10 dmm
mapper/dcs3800u31a_lun2 dmm
mapper/dcs3800u31a_lun4 dmm
mapper/dcs3800u31a_lun6 dmm
mapper/dcs3800u31a_lun8 dmm
mapper/dcs3800u31b_lun1 dmm
mapper/dcs3800u31b_lun11 dmm
mapper/dcs3800u31b_lun3 dmm
mapper/dcs3800u31b_lun5 dmm
mapper/dcs3800u31b_lun7 dmm
mapper/dcs3800u31b_lun9 dmm
[root at mmmnsd5 ~]#
--

That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information:

--
Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found.
Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found.
Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found.
Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found.
Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found.
Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found.
Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found.
Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found.
Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found.
Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found.
Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found.
Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found.
--

Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover:

--
[root at mmmnsd5 ~]# mmnsddiscover
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
[root at mmmnsd5 ~]#
--

I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch

--
[root at mmmnsd5 ~]# mmlsnsd -X -f gscratch

Disk name    NSD volume ID      Device         Devtype  Node name                Remarks
---------------------------------------------------------------------------------------------------
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun0 0A62001B54235577   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun10 0A62001C542355AA   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun2 0A62001C54235581   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun4 0A62001B5423558B   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd6.infini           (not found) server node
dcs3800u31a_lun6 0A62001C54235595   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd5.infini           (not found) server node
dcs3800u31a_lun8 0A62001B5423559F   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun1 0A62001B5423557C   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun11 0A62001C542355AF   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun3 0A62001C54235586   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun5 0A62001B54235590   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd6.infini           (not found) server node
dcs3800u31b_lun7 0A62001C5423559A   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd5.infini           (not found) server node
dcs3800u31b_lun9 0A62001B542355A4   -              -        mminsd6.infini           (not found) server node

[root at mmmnsd5 ~]#
--

I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information.

Regards,

Jared

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141029/c07e505f/attachment-0003.htm>

From jonathan at buzzard.me.uk  Wed Oct 29 20:32:25 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 20:32:25 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
Message-ID: <54514ED9.9030604@buzzard.me.uk>

On 29/10/14 20:25, Sven Oehme wrote:
> Hi,
>
> based on what i see is your BIOS or FW update wiped the NSD descriptor
> by restoring a GPT table on the start of a disk that shouldn't have a
> GPT table to begin with as its under control of GPFS.
> future releases of GPFS prevent this by writing our own GPT label to the
> disks so other tools don't touch them, but that doesn't help in your
> case any more. if you want this officially confirmed i would still open
> a PMR, but at that point given that you don't seem to have any
> production data on it from what i see in your response you should
> recreate the filesystem.
>

However before recreating the file system I would run the script to see 
if your disks have the secondary copy of the GPT partition table and if 
they do make sure it is wiped/removed *BEFORE* you go any further. 
Otherwise it could happen again...

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From Jared.Baker at uwyo.edu  Wed Oct 29 20:47:51 2014
From: Jared.Baker at uwyo.edu (Jared David Baker)
Date: Wed, 29 Oct 2014 20:47:51 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <54514ED9.9030604@buzzard.me.uk>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>
	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>
	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>
	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>
	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>
	<54514ED9.9030604@buzzard.me.uk>
Message-ID: <e97cc5ab29a0476f95fe1db739f8eea7@CY1PR0501MB1259.namprd05.prod.outlook.com>

Jonathan, which script are you talking about?

Thanks, Jared

-----Original Message-----
From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jonathan Buzzard
Sent: Wednesday, October 29, 2014 2:32 PM
To: gpfsug-discuss at gpfsug.org
Subject: Re: [gpfsug-discuss] Server lost NSD mappings

On 29/10/14 20:25, Sven Oehme wrote:
> Hi,
>
> based on what i see is your BIOS or FW update wiped the NSD descriptor
> by restoring a GPT table on the start of a disk that shouldn't have a
> GPT table to begin with as its under control of GPFS.
> future releases of GPFS prevent this by writing our own GPT label to the
> disks so other tools don't touch them, but that doesn't help in your
> case any more. if you want this officially confirmed i would still open
> a PMR, but at that point given that you don't seem to have any
> production data on it from what i see in your response you should
> recreate the filesystem.
>

However before recreating the file system I would run the script to see 
if your disks have the secondary copy of the GPT partition table and if 
they do make sure it is wiped/removed *BEFORE* you go any further. 
Otherwise it could happen again...

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


From jonathan at buzzard.me.uk  Wed Oct 29 21:01:06 2014
From: jonathan at buzzard.me.uk (Jonathan Buzzard)
Date: Wed, 29 Oct 2014 21:01:06 +0000
Subject: [gpfsug-discuss] Server lost NSD mappings
In-Reply-To: <e97cc5ab29a0476f95fe1db739f8eea7@CY1PR0501MB1259.namprd05.prod.outlook.com>
References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com>	<C59E5201836F7147BAD35189FFBB35D10116515BCC@USOAPP09V04P.si.lan>	<e13226e6ee2a42c8bf453b69c59a9a95@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF4B18B0F7.461FEF2E-ON88257D80.006C124A-88257D80.006C2857@us.ibm.com>	<07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF3A550294.0BD8A4CA-ON88257D80.006CBCF8-88257D80.006E2073@us.ibm.com>	<9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com>	<OF72760AC3.C6F06E26-ON88257D80.006FC13C-88257D80.00702AC5@us.ibm.com>	<54514ED9.9030604@buzzard.me.uk>
	<e97cc5ab29a0476f95fe1db739f8eea7@CY1PR0501MB1259.namprd05.prod.outlook.com>
Message-ID: <54515592.4050606@buzzard.me.uk>

On 29/10/14 20:47, Jared David Baker wrote:
> Jonathan, which script are you talking about?
>

The one here

https://www.ibm.com/developerworks/community/forums/html/topic?id=32296bac-bfa1-45ff-9a43-08b0a36b17ef&ps=25

Use for detecting and clearing that secondary GPT table. Never used it 
of course, my disaster was caused by an idiot admin installing a new OS 
not mapping the disks out and then hit yes yes yes when asked if he 
wanted to blank the disks, the RHEL installer duly obliged. Then five 
days later I rebooted the last NSD server for an upgrade and BOOM 50TB 
and 80 million files down the swanny.


JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.


From mark.bergman at uphs.upenn.edu  Fri Oct 31 17:10:55 2014
From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu)
Date: Fri, 31 Oct 2014 13:10:55 -0400
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
Message-ID: <25152-1414775455.156309@Pc2q.WYui.XCNm>

Many GPFS logs & utilities refer to nodes via their <cXnY> name.

I haven't found an "mm*" executable that shows the mapping between that
name an the hostname.

Is there a simple method to map the <cXnY> designation to the node's
hostname?

Thanks,

Mark


From bevans at pixitmedia.com  Fri Oct 31 17:32:45 2014
From: bevans at pixitmedia.com (Barry Evans)
Date: Fri, 31 Oct 2014 17:32:45 +0000
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm>
References: <25152-1414775455.156309@Pc2q.WYui.XCNm>
Message-ID: <5453C7BD.8030608@pixitmedia.com>

I'm sure there is a better way to do this, but old habits die hard. I 
tend to use 'mmfsadm saferdump tscomm' - connection details should be 
littered throughout.

Cheers,
Barry
ArcaStream/Pixit Media


mark.bergman at uphs.upenn.edu wrote:
> Many GPFS logs&  utilities refer to nodes via their<cXnY>  name.
>
> I haven't found an "mm*" executable that shows the mapping between that
> name an the hostname.
>
> Is there a simple method to map the<cXnY>  designation to the node's
> hostname?
>
> Thanks,
>
> Mark
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 

This email is confidential in that it is intended for the exclusive 
attention of the addressee(s) indicated. If you are not the intended 
recipient, this email should not be read or disclosed to any other person. 
Please notify the sender immediately and delete this email from your 
computer system. Any opinions expressed are not necessarily those of the 
company from which this email was sent and, whilst to the best of our 
knowledge no viruses or defects exist, no responsibility can be accepted 
for any loss or damage arising from its receipt or subsequent use of this 
email.


From oehmes at us.ibm.com  Fri Oct 31 18:20:40 2014
From: oehmes at us.ibm.com (Sven Oehme)
Date: Fri, 31 Oct 2014 11:20:40 -0700
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm>
References: <25152-1414775455.156309@Pc2q.WYui.XCNm>
Message-ID: <OFC9042F14.E53F6CAF-ON88257D82.0064B734-88257D82.0064C4CB@us.ibm.com>

Hi,

the official way to do this is mmdiag --network 

thx. Sven


------------------------------------------
Sven Oehme 
Scalable Storage Research 
email: oehmes at us.ibm.com 
Phone: +1 (408) 824-8904 
IBM Almaden Research Lab 
------------------------------------------


From:   mark.bergman at uphs.upenn.edu
To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
Date:   10/31/2014 10:11 AM
Subject:        [gpfsug-discuss] mapping <cXnY> to hostname?
Sent by:        gpfsug-discuss-bounces at gpfsug.org


Many GPFS logs & utilities refer to nodes via their <cXnY> name.

I haven't found an "mm*" executable that shows the mapping between that
name an the hostname.

Is there a simple method to map the <cXnY> designation to the node's
hostname?

Thanks,

Mark

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20141031/35713187/attachment-0003.htm>

From mark.bergman at uphs.upenn.edu  Fri Oct 31 18:57:44 2014
From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu)
Date: Fri, 31 Oct 2014 14:57:44 -0400
Subject: [gpfsug-discuss] mapping <cXnY> to hostname?
In-Reply-To: Your message of "Fri, 31 Oct 2014 11:20:40 -0700."
	<OFC9042F14.E53F6CAF-ON88257D82.0064B734-88257D82.0064C4CB@us.ibm.com>
References: <OFC9042F14.E53F6CAF-ON88257D82.0064B734-88257D82.0064C4CB@us.ibm.com>
	<25152-1414775455.156309@Pc2q.WYui.XCNm>
Message-ID: <9586-1414781864.388104@tEdB.dMla.tGDi>

In the message dated: Fri, 31 Oct 2014 11:20:40 -0700,
The pithy ruminations from Sven Oehme on 
<Re: [gpfsug-discuss] mapping <cXnY> to hostname?> were:
=> Hi,
=> 
=> the official way to do this is mmdiag --network 

OK.

I'm now using:

	mmdiag --network | awk '{if ( $1 ~ /<c[0-9]*n/ ) { printf $1 " " ; system("getent hosts "$2) }}'


Thanks,

Mark

=> 
=> thx. Sven
=> 
=> 
=> ------------------------------------------
=> Sven Oehme 
=> Scalable Storage Research 
=> email: oehmes at us.ibm.com 
=> Phone: +1 (408) 824-8904 
=> IBM Almaden Research Lab 
=> ------------------------------------------
=> 
=> 
=> 
=> From:   mark.bergman at uphs.upenn.edu
=> To:     gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
=> Date:   10/31/2014 10:11 AM
=> Subject:        [gpfsug-discuss] mapping <cXnY> to hostname?
=> Sent by:        gpfsug-discuss-bounces at gpfsug.org
=> 
=> 
=> 
=> Many GPFS logs & utilities refer to nodes via their <cXnY> name.
=> 
=> I haven't found an "mm*" executable that shows the mapping between that
=> name an the hostname.
=> 
=> Is there a simple method to map the <cXnY> designation to the node's
=> hostname?
=> 
=> Thanks,
=> 
=> Mark
=>