From stuartb at 4gh.net Fri Oct 3 18:19:08 2014 From: stuartb at 4gh.net (Stuart Barkley) Date: Fri, 3 Oct 2014 13:19:08 -0400 (EDT) Subject: [gpfsug-discuss] filesets and mountpoint naming Message-ID: Resent: First copy sent Sept 23. Maybe stuck in a moderation queue? When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From bbanister at jumptrading.com Mon Oct 6 16:17:44 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Mon, 6 Oct 2014 15:17:44 +0000 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com> There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang. Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment. Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley Sent: Friday, October 03, 2014 12:19 PM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] filesets and mountpoint naming Resent: First copy sent Sept 23. Maybe stuck in a moderation queue? When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From bbanister at jumptrading.com Mon Oct 6 16:36:17 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Mon, 6 Oct 2014 15:36:17 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch -j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the "--home-inode-file" from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sandra.McLaughlin at astrazeneca.com Mon Oct 6 16:40:45 2014 From: Sandra.McLaughlin at astrazeneca.com (McLaughlin, Sandra M) Date: Mon, 6 Oct 2014 15:40:45 +0000 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <5ed81d7bfbc94873aa804cfc807d5858@DBXPR04MB031.eurprd04.prod.outlook.com> Hi Stuart, We have a very similar setup. I use /gpfs01, /gpfs02 etc. and then use filesets within those, and symbolic links on the gpfs cluster members to give the same user experience combined with automounter maps (we have a large number of NFS clients as well as cluster members). This all works quite well. Regards, Sandra -------------------------------------------------------------------------- AstraZeneca UK Limited is a company incorporated in England and Wales with registered number: 03674842 and a registered office at 2 Kingdom Street, London, W2 6BD. Confidentiality Notice: This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful. Disclaimer: Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by AstraZeneca UK Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email. Monitoring: AstraZeneca UK Limited may monitor email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking Compliance with our Code of Conduct and Policies. -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley Sent: 23 September 2014 16:47 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] filesets and mountpoint naming When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From zgiles at gmail.com Mon Oct 6 16:42:56 2014 From: zgiles at gmail.com (Zachary Giles) Date: Mon, 6 Oct 2014 11:42:56 -0400 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Here we have just one large GPFS file system with many file sets inside. We mount it under /sc/something (sc for scientific computing). We user the /sc/ as we previously had another GPFS file system while migrating from one to the other. It's pretty easy and straight forward to have just one file system.. eases administration and mounting. You can make symlinks.. like /scratch -> /sc/something/scratch/ if you want. We did that, and it's how most of our users got to the system for a long time. We even remounted the GPFS file system from where DDN left it at install time ( /gs01 ) to /sc/gs01, updated the symlink, and the users never knew. Multicluster for compute nodes separate from the FS cluster. YMMV depending on if you want to allow everyone to mount your file system or not. I know some people don't. We only admin our own boxes and no one else does, so it works best this way for us given the ideal scenario. On Mon, Oct 6, 2014 at 11:17 AM, Bryan Banister wrote: > There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang. > > Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment. Hope that helps, > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley > Sent: Friday, October 03, 2014 12:19 PM > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] filesets and mountpoint naming > > Resent: First copy sent Sept 23. Maybe stuck in a moderation queue? > > When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: > > /home > /scratch > /projects > /reference > /applications > > We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). > > We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. > > We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. > > Some thoughts are to just do filesystems as: > > /gpfs01, /gpfs02, etc. > /mnt/gpfs01, etc > /mnt/clustera/gpfs01, etc. > > What have other people done? Are you happy with it? What would you do differently? > > Thanks, > Stuart > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com From oehmes at gmail.com Mon Oct 6 17:27:58 2014 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 6 Oct 2014 09:27:58 -0700 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: > Just an FYI to the GPFS user community, > > > > We have been testing out GPFS AFM file systems in our required process of > file data migration between two GPFS file systems. The two GPFS file > systems are managed in two separate GPFS clusters. We have a third GPFS > cluster for compute systems. We created new independent AFM filesets in > the new GPFS file system that are linked to directories in the old file > system. Unfortunately access to the AFM filesets from the compute cluster > completely hang. Access to the other parts of the second file system is > fine. This limitation/issue is not documented in the Advanced Admin Guide. > > > > Further, we performed prefetch operations using a file mmafmctl command, > but the process appears to be single threaded and the operation was > extremely slow as a result. According to the Advanced Admin Guide, it is > not possible to run multiple prefetch jobs on the same fileset: > > GPFS can prefetch the data using the *mmafmctl **Device **prefetch ?j **FilesetName > *command (which specifies > > a list of files to prefetch). Note the following about prefetching: > > v It can be run in parallel on multiple filesets (although more than one > prefetching job cannot be run in > > parallel on a single fileset). > > > > We were able to quickly create the ?--home-inode-file? from the old file > system using the mmapplypolicy command as the documentation describes. > However the AFM prefetch operation is so slow that we are better off > running parallel rsync operations between the file systems versus using the > GPFS AFM prefetch operation. > > > > Cheers, > > -Bryan > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Mon Oct 6 17:30:02 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Mon, 6 Oct 2014 16:30:02 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister > wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kgunda at in.ibm.com Tue Oct 7 06:03:07 2014 From: kgunda at in.ibm.com (Kalyan Gunda) Date: Tue, 7 Oct 2014 10:33:07 +0530 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Tue Oct 7 15:44:48 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 7 Oct 2014 14:44:48 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> Interesting that AFM is supposed to work in a multi-cluster environment. We were using GPFS on the backend. The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount. The "gateway" nodes in the new cluster mounted the old file system. All systems were connected over the same QDR IB fabric. The client compute nodes in the third cluster mounted both the old and new file systems. I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed. I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable. There were no error messages in any GPFS or system logs related to the `ls` fail. NFS clients accessing cNFS servers in the new cluster also worked as expected. The `ls` from the NFS client in an AFM fileset returned the expected directory listing. Thus all symptoms indicated the configuration wasn't supported. I may try to replicate the problem in a test environment at some point. However AFM isn't really a great solution for file data migration between file systems for these reasons: 1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data). No way to have it simply keep the two namespaces in sync. And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system. This is concerning and it is not easy to detect that this event has occurred. 2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME! And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES! 3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration. Multiple threads from a single node just doesn't cut it due to single node bandwidth limits. When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second. The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list! 4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration. Yes, there is of course efforts to breakup the namespace for each rsync operations. This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired. AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN. Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 12:03 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From kgunda at in.ibm.com Tue Oct 7 16:20:30 2014 From: kgunda at in.ibm.com (Kalyan Gunda) Date: Tue, 7 Oct 2014 20:50:30 +0530 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: some clarifications inline: Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/07/2014 08:12 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org Interesting that AFM is supposed to work in a multi-cluster environment. We were using GPFS on the backend. The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount. The "gateway" nodes in the new cluster mounted the old file system. All systems were connected over the same QDR IB fabric. The client compute nodes in the third cluster mounted both the old and new file systems. I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed. I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable. There were no error messages in any GPFS or system logs related to the `ls` fail. NFS clients accessing cNFS servers in the new cluster also worked as expected. The `ls` from the NFS client in an AFM fileset returned the expected directory listing. Thus all symptoms indicated the configuration wasn't supported. I may try to replicate the problem in a test environment at some point. However AFM isn't really a great solution for file data migration between file systems for these reasons: 1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data). No way to have it simply keep the two namespaces in sync. And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system. This is concerning and it is not easy to detect that this event has occurred. --> The LU mode is meant for scenarios where changes in cache are not meant to be pushed back to old filesystem. If thats not whats desired then other AFM modes like IW can be used to keep namespace in sync and data can flow from both sides. Typically, for data migration --metadata-only to pull in the full namespace first and data can be migrated on demand or via policy as outlined above using prefetch cmd. AFM setup should be extension to GPFS multi-cluster setup when using GPFS backend. 2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME! And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES! --> I am not sure I follow the first downtime. If applications have to start using the new filesystem, then they have to be informed accordingly. If this can be done without bringing down applications, then there is no DOWNTIME. Regarding, second downtime, you are right, disabling AFM after data migration requires unlink and hence downtime. But there is a easy workaround, where revalidation intervals can be increased to max or GW nodes can be unconfigured without downtime with same effect. And disabling AFM can be done at a later point during maintenance window. We plan to modify this to have this done online aka without requiring unlink of the fileset. This will get prioritized if there is enough interest in AFM being used in this direction. 3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration. Multiple threads from a single node just doesn't cut it due to single node bandwidth limits. When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second. The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list! --> Prefetch can run on multiple nodes by configuring multiple GW nodes and enabling parallel i/o as specified in the docs..link provided below. Infact it can parallelize data xfer to a single file and also do multiple files in parallel depending on filesizes and various tuning params. 4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration. Yes, there is of course efforts to breakup the namespace for each rsync operations. This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired. --> AFM can be used for data migration without any downtime dictated by AFM (see above) and it can infact use multiple threads on multiple nodes to do parallel i/o. AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN. Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 12:03 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sdinardo at ebi.ac.uk Thu Oct 9 13:02:44 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Thu, 09 Oct 2014 13:02:44 +0100 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? Message-ID: <54367964.1050900@ebi.ac.uk> Hello everyone, Suppose we want to build a new GPFS storage using SAN attached storages, but instead to put metadata in a shared storage, we want to use FusionIO PCI cards locally on the servers to speed up metadata operation( http://www.fusionio.com/products/iodrive) and for reliability, replicate the metadata in all the servers, will this work in case of server failure? To make it more clear: If a server fail i will loose also a metadata vdisk. Its the replica mechanism its reliable enough to avoid metadata corruption and loss of data? Thanks in advance Salvatore Di Nardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Oct 9 20:31:28 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 9 Oct 2014 19:31:28 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> Just wanted to pass my GPFS RFE along: http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 Description: GPFS File System Manager should provide the option to log all file and directory operations that occur in a file system, preferably stored in a TSD (Time Series Database) that could be quickly queried through an API interface and command line tools. This would allow many required file system management operations to obtain the change log of a file system namespace without having to use the GPFS ILM policy engine to search all file system metadata for changes, and would not need to run massive differential comparisons of file system namespace snapshots to determine what files have been modified, deleted, added, etc. It would be doubly great if this could be controlled on a per-fileset bases. Use case: This could be used for a very large number of file system management applications, including: 1) SOBAR (Scale-Out Backup And Restore) 2) Data Security Auditing and Monitoring applications 3) Async Replication of namespace between GPFS file systems without the requirement of AFM, which must use ILM policies that add unnecessary workload to metadata resources. 4) Application file system access profiling Please vote for it if you feel it would also benefit your operation, thanks, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From service at metamodul.com Fri Oct 10 13:21:43 2014 From: service at metamodul.com (service at metamodul.com) Date: Fri, 10 Oct 2014 14:21:43 +0200 (CEST) Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <937639307.291563.1412943703119.JavaMail.open-xchange@oxbaltgw12.schlund.de> > Bryan Banister hat am 9. Oktober 2014 um 21:31 > geschrieben: > > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 > > I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgp at psu.edu Fri Oct 10 16:04:02 2014 From: pgp at psu.edu (Phil Pishioneri) Date: Fri, 10 Oct 2014 11:04:02 -0400 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <5437F562.1080609@psu.edu> On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil From bbanister at jumptrading.com Fri Oct 10 16:08:04 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Oct 2014 15:08:04 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <5437F562.1080609@psu.edu> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From bdeluca at gmail.com Fri Oct 10 16:26:40 2014 From: bdeluca at gmail.com (Ben De Luca) Date: Fri, 10 Oct 2014 23:26:40 +0800 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister wrote: > Hmm... I didn't think to use the DMAPI interface. That could be a nice > option. Has anybody done this already and are there any examples we could > look at? > > Thanks! > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto: > gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri > Sent: Friday, October 10, 2014 10:04 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS RFE promotion > > On 10/9/14 3:31 PM, Bryan Banister wrote: > > > > Just wanted to pass my GPFS RFE along: > > > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > > 0458 > > > > > > *Description*: > > > > GPFS File System Manager should provide the option to log all file and > > directory operations that occur in a file system, preferably stored in > > a TSD (Time Series Database) that could be quickly queried through an > > API interface and command line tools. ... > > > > The rudimentaries for this already exist via the DMAPI interface in GPFS > (used by the TSM HSM product). A while ago this was posted to the IBM GPFS > DeveloperWorks forum: > > On 1/3/11 10:27 AM, dWForums wrote: > > Author: > > AlokK.Dhir > > > > Message: > > We have a proof of concept which uses DMAPI to listens to and passively > logs filesystem changes with a non blocking listener. This log can be used > to generate backup sets etc. Unfortunately, a bug in the current DMAPI > keeps this approach from working in the case of certain events. I am told > 3.4.0.3 may contain a fix. We will gladly share the code once it is > working. > > -Phil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Fri Oct 10 16:51:51 2014 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 10 Oct 2014 08:51:51 -0700 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca wrote: > Id like this to see hot files > > On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister < > bbanister at jumptrading.com> wrote: > >> Hmm... I didn't think to use the DMAPI interface. That could be a nice >> option. Has anybody done this already and are there any examples we could >> look at? >> >> Thanks! >> -Bryan >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org [mailto: >> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri >> Sent: Friday, October 10, 2014 10:04 AM >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] GPFS RFE promotion >> >> On 10/9/14 3:31 PM, Bryan Banister wrote: >> > >> > Just wanted to pass my GPFS RFE along: >> > >> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 >> > 0458 >> > >> > >> > *Description*: >> > >> > GPFS File System Manager should provide the option to log all file and >> > directory operations that occur in a file system, preferably stored in >> > a TSD (Time Series Database) that could be quickly queried through an >> > API interface and command line tools. ... >> > >> >> The rudimentaries for this already exist via the DMAPI interface in GPFS >> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS >> DeveloperWorks forum: >> >> On 1/3/11 10:27 AM, dWForums wrote: >> > Author: >> > AlokK.Dhir >> > >> > Message: >> > We have a proof of concept which uses DMAPI to listens to and passively >> logs filesystem changes with a non blocking listener. This log can be used >> to generate backup sets etc. Unfortunately, a bug in the current DMAPI >> keeps this approach from working in the case of certain events. I am told >> 3.4.0.3 may contain a fix. We will gladly share the code once it is >> working. >> >> -Phil >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) >> only and may contain proprietary, confidential or privileged information. >> If you are not the intended recipient, you are hereby notified that any >> review, dissemination or copying of this email is strictly prohibited, and >> to please notify the sender immediately and destroy this email and any >> attachments. Email transmission cannot be guaranteed to be secure or >> error-free. The Company, therefore, does not make any guarantees as to the >> completeness or accuracy of this email or any attachments. This email is >> for informational purposes only and does not constitute a recommendation, >> offer, request or solicitation of any kind to buy, sell, subscribe, redeem >> or perform any type of transaction of a financial product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Fri Oct 10 17:02:09 2014 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 10 Oct 2014 16:02:09 +0000 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? In-Reply-To: <54367964.1050900@ebi.ac.uk> References: <54367964.1050900@ebi.ac.uk> Message-ID: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> Hi Salvatore, We've done this before (non-shared metadata NSDs with GPFS 4.1) and noted these constraints: * Filesystem descriptor quorum: since it will be easier to have a metadata disk go offline, it's even more important to have three failure groups with FusionIO metadata NSDs in two, and at least a desc_only NSD in the third one. You may even want to explore having three full metadata replicas on FusionIO. (Or perhaps if your workload can tolerate it the third one can be slower but in another GPFS "subnet" so that it isn't used for reads.) * Make sure to set the correct default metadata replicas in your filesystem, corresponding to the number of metadata failure groups you set up. When a metadata server goes offline, it will take the metadata disks with it, and you want a replica of the metadata to be available. * When a metadata server goes offline and comes back up (after a maintenance reboot, for example), the non-shared metadata disks will be stopped. Until those are brought back into a well-known replicated state, you are at risk of a cluster-wide filesystem unmount if there is a subsequent metadata disk failure. But GPFS will continue to work, by default, allowing reads and writes against the remaining metadata replica. You must detect that disks are stopped (e.g. mmlsdisk) and restart them (e.g. with mmchdisk start ?a). I haven't seen anyone "recommend" running non-shared disk like this, and I wouldn't do this for things which can't afford to go offline unexpectedly and require a little more operational attention. But it does appear to work. Thx Paul Sanchez From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Salvatore Di Nardo Sent: Thursday, October 09, 2014 8:03 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? Hello everyone, Suppose we want to build a new GPFS storage using SAN attached storages, but instead to put metadata in a shared storage, we want to use FusionIO PCI cards locally on the servers to speed up metadata operation( http://www.fusionio.com/products/iodrive) and for reliability, replicate the metadata in all the servers, will this work in case of server failure? To make it more clear: If a server fail i will loose also a metadata vdisk. Its the replica mechanism its reliable enough to avoid metadata corruption and loss of data? Thanks in advance Salvatore Di Nardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Oct 10 17:05:03 2014 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 10 Oct 2014 11:05:03 -0500 Subject: [gpfsug-discuss] GPFS File Heat Message-ID: As Sven suggests, this is easy to gather once you turn on file heat. I run this heat.pol file against a file systems to gather the values: -- heat.pol -- define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) END]) rule fh1 external list 'fh' exec '' rule fh2 list 'fh' weight(FILE_HEAT) show( DISPLAY_NULL(FILE_HEAT) || '|' || varchar(file_size) ) -- heat.pol -- Produces output similar to this: /gpfs/.../specFile.pyc 535089836 5892 /gpfs/.../syspath.py 528685287 806 /gpfs/---/bwe.py 528160670 4607 Actual GPFS file path redacted :) After that it's a relatively straightforward process to go thru the values. There is no documentation on what the values really mean, but it does give you some overall indication of which files are getting the most hits. I have other information to share; drop me a note at my work email: robert.oesterlin at nuance.com Bob Oesterlin Sr Storage Engineer, Nuance Communications -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdeluca at gmail.com Fri Oct 10 17:09:49 2014 From: bdeluca at gmail.com (Ben De Luca) Date: Sat, 11 Oct 2014 00:09:49 +0800 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme wrote: > Ben, > > to get lists of 'Hot Files' turn File Heat on , some discussion about it > is here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 > > thx. Sven > > > On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca wrote: > >> Id like this to see hot files >> >> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister < >> bbanister at jumptrading.com> wrote: >> >>> Hmm... I didn't think to use the DMAPI interface. That could be a nice >>> option. Has anybody done this already and are there any examples we could >>> look at? >>> >>> Thanks! >>> -Bryan >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at gpfsug.org [mailto: >>> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri >>> Sent: Friday, October 10, 2014 10:04 AM >>> To: gpfsug main discussion list >>> Subject: Re: [gpfsug-discuss] GPFS RFE promotion >>> >>> On 10/9/14 3:31 PM, Bryan Banister wrote: >>> > >>> > Just wanted to pass my GPFS RFE along: >>> > >>> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 >>> > 0458 >>> > >>> > >>> > *Description*: >>> > >>> > GPFS File System Manager should provide the option to log all file and >>> > directory operations that occur in a file system, preferably stored in >>> > a TSD (Time Series Database) that could be quickly queried through an >>> > API interface and command line tools. ... >>> > >>> >>> The rudimentaries for this already exist via the DMAPI interface in GPFS >>> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS >>> DeveloperWorks forum: >>> >>> On 1/3/11 10:27 AM, dWForums wrote: >>> > Author: >>> > AlokK.Dhir >>> > >>> > Message: >>> > We have a proof of concept which uses DMAPI to listens to and >>> passively logs filesystem changes with a non blocking listener. This log >>> can be used to generate backup sets etc. Unfortunately, a bug in the >>> current DMAPI keeps this approach from working in the case of certain >>> events. I am told 3.4.0.3 may contain a fix. We will gladly share the >>> code once it is working. >>> >>> -Phil >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) >>> only and may contain proprietary, confidential or privileged information. >>> If you are not the intended recipient, you are hereby notified that any >>> review, dissemination or copying of this email is strictly prohibited, and >>> to please notify the sender immediately and destroy this email and any >>> attachments. Email transmission cannot be guaranteed to be secure or >>> error-free. The Company, therefore, does not make any guarantees as to the >>> completeness or accuracy of this email or any attachments. This email is >>> for informational purposes only and does not constitute a recommendation, >>> offer, request or solicitation of any kind to buy, sell, subscribe, redeem >>> or perform any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 10 17:15:22 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Oct 2014 16:15:22 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> I agree with Ben, I think. I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Fri Oct 10 17:24:32 2014 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 10 Oct 2014 16:24:32 +0000 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <201D6001C896B846A9CFC2E841986AC1451878D2@mailnycmb2a.winmail.deshaw.com> We've been mounting all filesystems in a canonical location and bind mounting filesets into the namespace. One gotcha that we recently encountered though was the selection of /gpfs as the root of the canonical mount path. (By default automountdir is set to /gpfs/automountdir, which made this seem like a good spot.) This seems to be where gpfs expects filesystems to be mounted, since there are some hardcoded references in the gpfs.base RPM %pre script (RHEL package for GPFS) which try to nudge processes off of the filesystems before yanking the mounts during an RPM version upgrade. This however may take an exceedingly long time, since it's doing an 'lsof +D /gpfs' which walks the filesystems. -Paul Sanchez -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley Sent: Tuesday, September 23, 2014 11:47 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] filesets and mountpoint naming When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Fri Oct 10 17:52:27 2014 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 10 Oct 2014 09:52:27 -0700 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister wrote: > I agree with Ben, I think. > > > > I don?t want to use the ILM policy engine as that puts a direct workload > against the metadata storage and server resources. We need something > out-of-band, out of the file system operational path. > > > > Is there a simple DMAPI daemon that would log the file system namespace > changes that we could use? > > > > If so are there any limitations? > > > > And is it possible to set this up in an HA environment? > > > > Thanks! > > -Bryan > > > > *From:* gpfsug-discuss-bounces at gpfsug.org [mailto: > gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Ben De Luca > *Sent:* Friday, October 10, 2014 11:10 AM > > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > > > querying this through the policy engine is far to late to do any thing > useful with it > > > > On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme wrote: > > Ben, > > > > to get lists of 'Hot Files' turn File Heat on , some discussion about it > is here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 > > > > thx. Sven > > > > > > On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca wrote: > > Id like this to see hot files > > > > On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister < > bbanister at jumptrading.com> wrote: > > Hmm... I didn't think to use the DMAPI interface. That could be a nice > option. Has anybody done this already and are there any examples we could > look at? > > Thanks! > -Bryan > > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto: > gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri > Sent: Friday, October 10, 2014 10:04 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS RFE promotion > > On 10/9/14 3:31 PM, Bryan Banister wrote: > > > > Just wanted to pass my GPFS RFE along: > > > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > > 0458 > > > > > > *Description*: > > > > GPFS File System Manager should provide the option to log all file and > > directory operations that occur in a file system, preferably stored in > > a TSD (Time Series Database) that could be quickly queried through an > > API interface and command line tools. ... > > > > The rudimentaries for this already exist via the DMAPI interface in GPFS > (used by the TSM HSM product). A while ago this was posted to the IBM GPFS > DeveloperWorks forum: > > On 1/3/11 10:27 AM, dWForums wrote: > > Author: > > AlokK.Dhir > > > > Message: > > We have a proof of concept which uses DMAPI to listens to and passively > logs filesystem changes with a non blocking listener. This log can be used > to generate backup sets etc. Unfortunately, a bug in the current DMAPI > keeps this approach from working in the case of certain events. I am told > 3.4.0.3 may contain a fix. We will gladly share the code once it is > working. > > -Phil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 10 18:13:16 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Oct 2014 17:13:16 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted. I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable. I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!) Thanks, -Bryan Ps. Hajo said that he couldn?t access the RFE to vote on it: I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo Here is what the RFE website states: Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 A unique URL that you can bookmark and share with others. From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Friday, October 10, 2014 11:52 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > wrote: I agree with Ben, I think. I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Sat Oct 11 10:37:10 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Sat, 11 Oct 2014 10:37:10 +0100 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> References: <54367964.1050900@ebi.ac.uk> <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> Message-ID: <5438FA46.7090902@ebi.ac.uk> Thanks for your answer. Yes, the idea is to have 3 servers in 3 different failure groups. Each of them with a drive and set 3 metadata replica as the default one. I have not considered that the vdisks could be off after a 'reboot' or failure, so that's a good point, but anyway , after a failure or even a standard reboot, the server and the cluster have to be checked anyway, and i always check the vdisk status, so no big deal. Your answer made me consider also another thing... Once put them back online, they will be restriped automatically or should i run every time 'mmrestripefs' to verify/correct the replicas? I understand that use lodal disk sound strange, infact our first idea was just to add some ssd to the shared storage, but then we considered that the sas cable could be a huge bottleneck. The cost difference is not huge and the fusioio locally on the server would make the metadata just fly. On 10/10/14 17:02, Sanchez, Paul wrote: > > Hi Salvatore, > > We've done this before (non-shared metadata NSDs with GPFS 4.1) and > noted these constraints: > > * Filesystem descriptor quorum: since it will be easier to have a > metadata disk go offline, it's even more important to have three > failure groups with FusionIO metadata NSDs in two, and at least a > desc_only NSD in the third one. You may even want to explore having > three full metadata replicas on FusionIO. (Or perhaps if your workload > can tolerate it the third one can be slower but in another GPFS > "subnet" so that it isn't used for reads.) > > * Make sure to set the correct default metadata replicas in your > filesystem, corresponding to the number of metadata failure groups you > set up. When a metadata server goes offline, it will take the metadata > disks with it, and you want a replica of the metadata to be available. > > * When a metadata server goes offline and comes back up (after a > maintenance reboot, for example), the non-shared metadata disks will > be stopped. Until those are brought back into a well-known replicated > state, you are at risk of a cluster-wide filesystem unmount if there > is a subsequent metadata disk failure. But GPFS will continue to work, > by default, allowing reads and writes against the remaining metadata > replica. You must detect that disks are stopped (e.g. mmlsdisk) and > restart them (e.g. with mmchdisk start ?a). > > I haven't seen anyone "recommend" running non-shared disk like this, > and I wouldn't do this for things which can't afford to go offline > unexpectedly and require a little more operational attention. But it > does appear to work. > > Thx > Paul Sanchez > > *From:*gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di > Nardo > *Sent:* Thursday, October 09, 2014 8:03 AM > *To:* gpfsug main discussion list > *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable? > > Hello everyone, > > Suppose we want to build a new GPFS storage using SAN attached > storages, but instead to put metadata in a shared storage, we want to > use FusionIO PCI cards locally on the servers to speed up metadata > operation( http://www.fusionio.com/products/iodrive) and for > reliability, replicate the metadata in all the servers, will this work > in case of server failure? > > To make it more clear: If a server fail i will loose also a metadata > vdisk. Its the replica mechanism its reliable enough to avoid metadata > corruption and loss of data? > > Thanks in advance > Salvatore Di Nardo > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From service at metamodul.com Sun Oct 12 17:03:56 2014 From: service at metamodul.com (MetaService) Date: Sun, 12 Oct 2014 18:03:56 +0200 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <1413129836.4846.9.camel@titan> My preferred naming convention is to use the cluster name or part of it as the base directory for all GPFS mounts. Example: Clustername=c1_eum would mean that: /c1_eum/ would be the base directory for all Cluster c1_eum GPFSs In case a second local cluster would exist its root mount point would be /c2_eum/ Even in case of mounting remote clusters a naming collision is not very likely. BTW: For accessing the the final directories /.../scratch ... the user should not rely on the mount points but on given variables provided. CLS_HOME=/... CLS_SCRATCH=/.... hth Hajo From lhorrocks-barlow at ocf.co.uk Fri Oct 10 17:48:24 2014 From: lhorrocks-barlow at ocf.co.uk (Laurence Horrocks- Barlow) Date: Fri, 10 Oct 2014 17:48:24 +0100 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> References: <54367964.1050900@ebi.ac.uk> <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> Message-ID: <54380DD8.2020909@ocf.co.uk> Hi Salvatore, Just to add that when the local metadata disk fails or the server goes offline there will most likely be an I/O interruption/pause whist the GPFS cluster renegotiates. The main concept to be aware of (as Paul mentioned) is that when a disk goes offline it will appear down to GPFS, once you've started the disk again it will rediscover and scan the metadata for any missing updates, these updates are then repaired/replicated again. Laurence Horrocks-Barlow Linux Systems Software Engineer OCF plc Tel: +44 (0)114 257 2200 Fax: +44 (0)114 257 0022 Web: www.ocf.co.uk Blog: blog.ocf.co.uk Twitter: @ocfplc OCF plc is a company registered in England and Wales. Registered number 4132533, VAT number GB 780 6803 14. Registered office address: OCF plc, 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 2PG. This message is private and confidential. If you have received this message in error, please notify us and remove it from your system. On 10/10/2014 17:02, Sanchez, Paul wrote: > > Hi Salvatore, > > We've done this before (non-shared metadata NSDs with GPFS 4.1) and > noted these constraints: > > * Filesystem descriptor quorum: since it will be easier to have a > metadata disk go offline, it's even more important to have three > failure groups with FusionIO metadata NSDs in two, and at least a > desc_only NSD in the third one. You may even want to explore having > three full metadata replicas on FusionIO. (Or perhaps if your workload > can tolerate it the third one can be slower but in another GPFS > "subnet" so that it isn't used for reads.) > > * Make sure to set the correct default metadata replicas in your > filesystem, corresponding to the number of metadata failure groups you > set up. When a metadata server goes offline, it will take the metadata > disks with it, and you want a replica of the metadata to be available. > > * When a metadata server goes offline and comes back up (after a > maintenance reboot, for example), the non-shared metadata disks will > be stopped. Until those are brought back into a well-known replicated > state, you are at risk of a cluster-wide filesystem unmount if there > is a subsequent metadata disk failure. But GPFS will continue to work, > by default, allowing reads and writes against the remaining metadata > replica. You must detect that disks are stopped (e.g. mmlsdisk) and > restart them (e.g. with mmchdisk start ?a). > > I haven't seen anyone "recommend" running non-shared disk like this, > and I wouldn't do this for things which can't afford to go offline > unexpectedly and require a little more operational attention. But it > does appear to work. > > Thx > Paul Sanchez > > *From:*gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di > Nardo > *Sent:* Thursday, October 09, 2014 8:03 AM > *To:* gpfsug main discussion list > *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable? > > Hello everyone, > > Suppose we want to build a new GPFS storage using SAN attached > storages, but instead to put metadata in a shared storage, we want to > use FusionIO PCI cards locally on the servers to speed up metadata > operation( http://www.fusionio.com/products/iodrive) and for > reliability, replicate the metadata in all the servers, will this work > in case of server failure? > > To make it more clear: If a server fail i will loose also a metadata > vdisk. Its the replica mechanism its reliable enough to avoid metadata > corruption and loss of data? > > Thanks in advance > Salvatore Di Nardo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: lhorrocks-barlow.vcf Type: text/x-vcard Size: 388 bytes Desc: not available URL: From kraemerf at de.ibm.com Mon Oct 13 12:10:17 2014 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Mon, 13 Oct 2014 13:10:17 +0200 Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany Message-ID: GPFS at LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany Oct 14th 11:15-12:05 Room 18 http://sched.co/1uMYEWK Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany From service at metamodul.com Mon Oct 13 16:49:44 2014 From: service at metamodul.com (service at metamodul.com) Date: Mon, 13 Oct 2014 17:49:44 +0200 (CEST) Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany In-Reply-To: References: Message-ID: <994787708.574787.1413215384447.JavaMail.open-xchange@oxbaltgw12.schlund.de> Hallo Frank, the announcement is a little bit to late for me. Would be nice if you could share your speech later. cheers Hajo -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Oct 14 15:39:35 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Oct 2014 15:39:35 +0100 Subject: [gpfsug-discuss] wait for permission to append to log Message-ID: <543D35A7.7080800@ebi.ac.uk> hello all, could someone explain me the meaning of those waiters? gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' Does it means that the vdisk logs are struggling? Regards, Salvatore From oehmes at us.ibm.com Tue Oct 14 15:51:10 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 07:51:10 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: <543D35A7.7080800@ebi.ac.uk> References: <543D35A7.7080800@ebi.ac.uk> Message-ID: it means there is contention on inserting data into the fast write log on the GSS Node, which could be config or workload related what GSS code version are you running and how are the nodes connected with each other (Ethernet or IB) ? ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Salvatore Di Nardo To: gpfsug main discussion list Date: 10/14/2014 07:40 AM Subject: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org hello all, could someone explain me the meaning of those waiters? gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' Does it means that the vdisk logs are struggling? Regards, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Oct 14 16:23:01 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Oct 2014 16:23:01 +0100 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> Message-ID: <543D3FD5.1060705@ebi.ac.uk> On 14/10/14 15:51, Sven Oehme wrote: > it means there is contention on inserting data into the fast write log > on the GSS Node, which could be config or workload related > what GSS code version are you running [root at ebi5-251 ~]# mmdiag --version === mmdiag: version === Current GPFS build: "3.5.0-11 efix1 (888041)". Built on Jul 9 2013 at 18:03:32 Running 6 days 2 hours 10 minutes 35 secs > and how are the nodes connected with each other (Ethernet or IB) ? ethernet. they use the same bonding (4x10Gb/s) where the data is passing. We don't have admin dedicated network [root at gss03a ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager *Note:* The 3 node "pairs" (gss01, gss02 and gss03) are in different subnet because of datacenter constraints ( They are not physically in the same row, and due to network constraints was not possible to put them in the same subnet). The packets are routed, but should not be a problem as there is 160Gb/s bandwidth between them. Regards, Salvatore > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug main discussion list > Date: 10/14/2014 07:40 AM > Subject: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > ------------------------------------------------------------------------ > > > > hello all, > could someone explain me the meaning of those waiters? > > gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > > Does it means that the vdisk logs are struggling? > > Regards, > Salvatore > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Tue Oct 14 17:22:41 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 09:22:41 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: <543D3FD5.1060705@ebi.ac.uk> References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> Message-ID: your GSS code version is very backlevel. can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk as well as mmlsconfig and mmlsfs all thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Salvatore Di Nardo To: gpfsug-discuss at gpfsug.org Date: 10/14/2014 08:23 AM Subject: Re: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org On 14/10/14 15:51, Sven Oehme wrote: it means there is contention on inserting data into the fast write log on the GSS Node, which could be config or workload related what GSS code version are you running [root at ebi5-251 ~]# mmdiag --version === mmdiag: version === Current GPFS build: "3.5.0-11 efix1 (888041)". Built on Jul 9 2013 at 18:03:32 Running 6 days 2 hours 10 minutes 35 secs and how are the nodes connected with each other (Ethernet or IB) ? ethernet. they use the same bonding (4x10Gb/s) where the data is passing. We don't have admin dedicated network [root at gss03a ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different subnet because of datacenter constraints ( They are not physically in the same row, and due to network constraints was not possible to put them in the same subnet). The packets are routed, but should not be a problem as there is 160Gb/s bandwidth between them. Regards, Salvatore ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Salvatore Di Nardo To: gpfsug main discussion list Date: 10/14/2014 07:40 AM Subject: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org hello all, could someone explain me the meaning of those waiters? gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' Does it means that the vdisk logs are struggling? Regards, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Oct 14 17:39:18 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Oct 2014 17:39:18 +0100 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> Message-ID: <543D51B6.3070602@ebi.ac.uk> Thanks in advance for your help. We have 6 RG: recovery group vdisks vdisks servers ------------------ ----------- ------ ------- gss01a 4 8 gss01a.ebi.ac.uk,gss01b.ebi.ac.uk gss01b 4 8 gss01b.ebi.ac.uk,gss01a.ebi.ac.uk gss02a 4 8 gss02a.ebi.ac.uk,gss02b.ebi.ac.uk gss02b 4 8 gss02b.ebi.ac.uk,gss02a.ebi.ac.uk gss03a 4 8 gss03a.ebi.ac.uk,gss03b.ebi.ac.uk gss03b 4 8 gss03b.ebi.ac.uk,gss03a.ebi.ac.uk Check the attached file for RG details. Following mmlsconfig: [root at gss01a ~]# mmlsconfig Configuration data for cluster GSS.ebi.ac.uk: --------------------------------------------- myNodeConfigNumber 1 clusterName GSS.ebi.ac.uk clusterId 17987981184946329605 autoload no dmapiFileHandleSize 32 minReleaseLevel 3.5.0.11 [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] pagepool 38g nsdRAIDBufferPoolSizePct 80 maxBufferDescs 2m numaMemoryInterleave yes prefetchPct 5 maxblocksize 16m nsdRAIDTracks 128k ioHistorySize 64k nsdRAIDSmallBufferSize 256k nsdMaxWorkerThreads 3k nsdMinWorkerThreads 3k nsdRAIDSmallThreadRatio 2 nsdRAIDThreadsPerQueue 16 nsdClientCksumTypeLocal ck64 nsdClientCksumTypeRemote ck64 nsdRAIDEventLogToConsole all nsdRAIDFastWriteFSDataLimit 64k nsdRAIDFastWriteFSMetadataLimit 256k nsdRAIDReconstructAggressiveness 1 nsdRAIDFlusherBuffersLowWatermarkPct 20 nsdRAIDFlusherBuffersLimitPct 80 nsdRAIDFlusherTracksLowWatermarkPct 20 nsdRAIDFlusherTracksLimitPct 80 nsdRAIDFlusherFWLogHighWatermarkMB 1000 nsdRAIDFlusherFWLogLimitMB 5000 nsdRAIDFlusherThreadsLowWatermark 1 nsdRAIDFlusherThreadsHighWatermark 512 nsdRAIDBlockDeviceMaxSectorsKB 4096 nsdRAIDBlockDeviceNrRequests 32 nsdRAIDBlockDeviceQueueDepth 16 nsdRAIDBlockDeviceScheduler deadline nsdRAIDMaxTransientStale2FT 1 nsdRAIDMaxTransientStale3FT 1 syncWorkerThreads 256 tscWorkerPool 64 nsdInlineWriteMax 32k maxFilesToCache 12k maxStatCache 512 maxGeneralThreads 1280 flushedDataTarget 1024 flushedInodeTarget 1024 maxFileCleaners 1024 maxBufferCleaners 1024 logBufferCount 20 logWrapAmountPct 2 logWrapThreads 128 maxAllocRegionsPerNode 32 maxBackgroundDeletionThreads 16 maxInodeDeallocPrefetch 128 maxMBpS 16000 maxReceiverThreads 128 worker1Threads 1024 worker3Threads 32 [common] cipherList AUTHONLY socketMaxListenConnections 1500 failureDetectionTime 60 [common] adminMode central File systems in cluster GSS.ebi.ac.uk: -------------------------------------- /dev/gpfs1 For more configuration paramenters i also attached a file with the complete output of mmdiag --config. and mmlsfs: File system attributes for /dev/gpfs1: ====================================== flag value description ------------------- ------------------------ ----------------------------------- -f 32768 Minimum fragment size in bytes (system pool) 262144 Minimum fragment size in bytes (other pools) -i 512 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1000 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 8388608 Block size (other pools) -Q user;group;fileset Quotas enforced user;group;fileset Default quotas enabled --filesetdf no Fileset df enabled? -V 13.23 (3.5.0.7) File system version --create-time Tue Mar 18 16:01:24 2014 File system creation time -u yes Support for large LUNs? -z no Is DMAPI enabled? -L 4194304 Logfile size -E yes Exact mtime mount option -S yes Suppress atime mount option -K whenpossible Strict replica allocation option --fastea yes Fast external attributes enabled? --inode-limit 134217728 Maximum number of inodes -P system;data Disk storage pools in file system -d gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; -d gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; -d gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; -d gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; -d gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 Disks in file system --perfileset-quota no Per-fileset quota enforcement -A yes Automatic mount option -o none Additional mount options -T /gpfs1 Default mount point --mount-priority 0 Mount priority Regards, Salvatore On 14/10/14 17:22, Sven Oehme wrote: > your GSS code version is very backlevel. > > can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk > as well as mmlsconfig and mmlsfs all > > thx. Sven > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug-discuss at gpfsug.org > Date: 10/14/2014 08:23 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > ------------------------------------------------------------------------ > > > > > On 14/10/14 15:51, Sven Oehme wrote: > it means there is contention on inserting data into the fast write log > on the GSS Node, which could be config or workload related > what GSS code version are you running > [root at ebi5-251 ~]# mmdiag --version > > === mmdiag: version === > Current GPFS build: "3.5.0-11 efix1 (888041)". > Built on Jul 9 2013 at 18:03:32 > Running 6 days 2 hours 10 minutes 35 secs > > > > and how are the nodes connected with each other (Ethernet or IB) ? > ethernet. they use the same bonding (4x10Gb/s) where the data is > passing. We don't have admin dedicated network > > [root at gss03a ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager > > > *Note:* The 3 node "pairs" (gss01, gss02 and gss03) are in different > subnet because of datacenter constraints ( They are not physically in > the same row, and due to network constraints was not possible to put > them in the same subnet). The packets are routed, but should not be a > problem as there is 160Gb/s bandwidth between them. > > Regards, > Salvatore > > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: _oehmes at us.ibm.com_ > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo __ > > To: gpfsug main discussion list __ > > Date: 10/14/2014 07:40 AM > Subject: [gpfsug-discuss] wait for permission to append to log > Sent by: _gpfsug-discuss-bounces at gpfsug.org_ > > ------------------------------------------------------------------------ > > > > hello all, > could someone explain me the meaning of those waiters? > > gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > > Does it means that the vdisk logs are struggling? > > Regards, > Salvatore > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss01a 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 42% low DA3 no 2 58 2 1 786 GiB 14 days scrub 4% low DA2 no 2 58 2 1 786 GiB 14 days scrub 4% low DA1 no 3 58 2 1 626 GiB 14 days scrub 59% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s01 2 DA3 110 GiB ok e1d1s02 2 DA2 110 GiB ok e1d1s03log 2 LOG 186 GiB ok e1d1s04 2 DA1 108 GiB ok e1d1s05 2 DA2 110 GiB ok e1d1s06 2 DA3 110 GiB ok e1d2s01 2 DA1 108 GiB ok e1d2s02 2 DA2 110 GiB ok e1d2s03 2 DA3 110 GiB ok e1d2s04 2 DA1 108 GiB ok e1d2s05 2 DA2 110 GiB ok e1d2s06 2 DA3 110 GiB ok e1d3s01 2 DA1 108 GiB ok e1d3s02 2 DA2 110 GiB ok e1d3s03 2 DA3 110 GiB ok e1d3s04 2 DA1 108 GiB ok e1d3s05 2 DA2 110 GiB ok e1d3s06 2 DA3 110 GiB ok e1d4s01 2 DA1 108 GiB ok e1d4s02 2 DA2 110 GiB ok e1d4s03 2 DA3 110 GiB ok e1d4s04 2 DA1 108 GiB ok e1d4s05 2 DA2 110 GiB ok e1d4s06 2 DA3 110 GiB ok e1d5s01 2 DA1 108 GiB ok e1d5s02 2 DA2 110 GiB ok e1d5s03 2 DA3 110 GiB ok e1d5s04 2 DA1 108 GiB ok e1d5s05 2 DA2 110 GiB ok e1d5s06 2 DA3 110 GiB ok e2d1s01 2 DA3 110 GiB ok e2d1s02 2 DA2 110 GiB ok e2d1s03log 2 LOG 186 GiB ok e2d1s04 2 DA1 108 GiB ok e2d1s05 2 DA2 110 GiB ok e2d1s06 2 DA3 110 GiB ok e2d2s01 2 DA1 108 GiB ok e2d2s02 2 DA2 110 GiB ok e2d2s03 2 DA3 110 GiB ok e2d2s04 2 DA1 108 GiB ok e2d2s05 2 DA2 110 GiB ok e2d2s06 2 DA3 110 GiB ok e2d3s01 2 DA1 108 GiB ok e2d3s02 2 DA2 110 GiB ok e2d3s03 2 DA3 110 GiB ok e2d3s04 2 DA1 108 GiB ok e2d3s05 2 DA2 110 GiB ok e2d3s06 2 DA3 110 GiB ok e2d4s01 2 DA1 108 GiB ok e2d4s02 2 DA2 110 GiB ok e2d4s03 2 DA3 110 GiB ok e2d4s04 2 DA1 108 GiB ok e2d4s05 2 DA2 110 GiB ok e2d4s06 2 DA3 110 GiB ok e2d5s01 2 DA1 108 GiB ok e2d5s02 2 DA2 110 GiB ok e2d5s03 2 DA3 110 GiB ok e2d5s04 2 DA1 108 GiB ok e2d5s05 2 DA2 110 GiB ok e2d5s06 2 DA3 110 GiB ok e3d1s01 2 DA1 108 GiB ok e3d1s02 2 DA3 110 GiB ok e3d1s03log 2 LOG 186 GiB ok e3d1s04 2 DA1 108 GiB ok e3d1s05 2 DA2 110 GiB ok e3d1s06 2 DA3 110 GiB ok e3d2s01 2 DA1 108 GiB ok e3d2s02 2 DA2 110 GiB ok e3d2s03 2 DA3 110 GiB ok e3d2s04 2 DA1 108 GiB ok e3d2s05 2 DA2 110 GiB ok e3d2s06 2 DA3 110 GiB ok e3d3s01 2 DA1 108 GiB ok e3d3s02 2 DA2 110 GiB ok e3d3s03 2 DA3 110 GiB ok e3d3s04 2 DA1 108 GiB ok e3d3s05 2 DA2 110 GiB ok e3d3s06 2 DA3 110 GiB ok e3d4s01 2 DA1 108 GiB ok e3d4s02 2 DA2 110 GiB ok e3d4s03 2 DA3 110 GiB ok e3d4s04 2 DA1 108 GiB ok e3d4s05 2 DA2 110 GiB ok e3d4s06 2 DA3 110 GiB ok e3d5s01 2 DA1 108 GiB ok e3d5s02 2 DA2 110 GiB ok e3d5s03 2 DA3 110 GiB ok e3d5s04 2 DA1 108 GiB ok e3d5s05 2 DA2 110 GiB ok e3d5s06 2 DA3 110 GiB ok e4d1s01 2 DA1 108 GiB ok e4d1s02 2 DA3 110 GiB ok e4d1s04 2 DA1 108 GiB ok e4d1s05 2 DA2 110 GiB ok e4d1s06 2 DA3 110 GiB ok e4d2s01 2 DA1 108 GiB ok e4d2s02 2 DA2 110 GiB ok e4d2s03 2 DA3 110 GiB ok e4d2s04 2 DA1 106 GiB ok e4d2s05 2 DA2 110 GiB ok e4d2s06 2 DA3 110 GiB ok e4d3s01 2 DA1 106 GiB ok e4d3s02 2 DA2 110 GiB ok e4d3s03 2 DA3 110 GiB ok e4d3s04 2 DA1 106 GiB ok e4d3s05 2 DA2 110 GiB ok e4d3s06 2 DA3 110 GiB ok e4d4s01 2 DA1 106 GiB ok e4d4s02 2 DA2 110 GiB ok e4d4s03 2 DA3 110 GiB ok e4d4s04 2 DA1 106 GiB ok e4d4s05 2 DA2 110 GiB ok e4d4s06 2 DA3 110 GiB ok e4d5s01 2 DA1 106 GiB ok e4d5s02 2 DA2 110 GiB ok e4d5s03 2 DA3 110 GiB ok e4d5s04 2 DA1 106 GiB ok e4d5s05 2 DA2 110 GiB ok e4d5s06 2 DA3 110 GiB ok e5d1s01 2 DA1 106 GiB ok e5d1s02 2 DA2 110 GiB ok e5d1s04 2 DA1 106 GiB ok e5d1s05 2 DA2 110 GiB ok e5d1s06 2 DA3 110 GiB ok e5d2s01 2 DA1 106 GiB ok e5d2s02 2 DA2 110 GiB ok e5d2s03 2 DA3 110 GiB ok e5d2s04 2 DA1 106 GiB ok e5d2s05 2 DA2 110 GiB ok e5d2s06 2 DA3 110 GiB ok e5d3s01 2 DA1 106 GiB ok e5d3s02 2 DA2 110 GiB ok e5d3s03 2 DA3 110 GiB ok e5d3s04 2 DA1 106 GiB ok e5d3s05 2 DA2 110 GiB ok e5d3s06 2 DA3 110 GiB ok e5d4s01 2 DA1 106 GiB ok e5d4s02 2 DA2 110 GiB ok e5d4s03 2 DA3 110 GiB ok e5d4s04 2 DA1 106 GiB ok e5d4s05 2 DA2 110 GiB ok e5d4s06 2 DA3 110 GiB ok e5d5s01 2 DA1 106 GiB ok e5d5s02 2 DA2 110 GiB ok e5d5s03 2 DA3 110 GiB ok e5d5s04 2 DA1 106 GiB ok e5d5s05 2 DA2 110 GiB ok e5d5s06 2 DA3 110 GiB ok e6d1s01 2 DA1 106 GiB ok e6d1s02 2 DA2 110 GiB ok e6d1s04 2 DA1 106 GiB ok e6d1s05 2 DA2 110 GiB ok e6d1s06 2 DA3 110 GiB ok e6d2s01 2 DA1 106 GiB ok e6d2s02 2 DA2 110 GiB ok e6d2s03 2 DA3 110 GiB ok e6d2s04 2 DA1 106 GiB ok e6d2s05 2 DA2 110 GiB ok e6d2s06 2 DA3 110 GiB ok e6d3s01 2 DA1 106 GiB ok e6d3s02 2 DA2 110 GiB ok e6d3s03 2 DA3 110 GiB ok e6d3s04 2 DA1 106 GiB ok e6d3s05 2 DA2 108 GiB ok e6d3s06 2 DA3 108 GiB ok e6d4s01 2 DA1 106 GiB ok e6d4s02 2 DA2 108 GiB ok e6d4s03 2 DA3 108 GiB ok e6d4s04 2 DA1 106 GiB ok e6d4s05 2 DA2 108 GiB ok e6d4s06 2 DA3 108 GiB ok e6d5s01 2 DA1 106 GiB ok e6d5s02 2 DA2 108 GiB ok e6d5s03 2 DA3 108 GiB ok e6d5s04 2 DA1 106 GiB ok e6d5s05 2 DA2 108 GiB ok e6d5s06 2 DA3 108 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss01a_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss01a_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss01a_MetaData_8M_3p_1 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss01a_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss01a_MetaData_8M_3p_3 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss01a_Data_8M_3p_1 8+3p DA3 99 TiB 8 MiB 32 KiB gss01a_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss01a_Data_8M_3p_3 8+3p DA1 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss01a.ebi.ac.uk gss01a.ebi.ac.uk,gss01b.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss01b 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 36% low DA1 no 3 58 2 1 626 GiB 14 days scrub 61% low DA2 no 2 58 2 1 786 GiB 14 days scrub 68% low DA3 no 2 58 2 1 786 GiB 14 days scrub 70% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s07 2 DA1 108 GiB ok e1d1s08 2 DA2 110 GiB ok e1d1s09 2 DA3 110 GiB ok e1d1s10 2 DA1 108 GiB ok e1d1s11 2 DA2 110 GiB ok e1d1s12 2 DA3 110 GiB ok e1d2s07 2 DA1 108 GiB ok e1d2s08 2 DA2 110 GiB ok e1d2s09 2 DA3 110 GiB ok e1d2s10 2 DA1 108 GiB ok e1d2s11 2 DA2 110 GiB ok e1d2s12 2 DA3 110 GiB ok e1d3s07 2 DA1 108 GiB ok e1d3s08 2 DA2 110 GiB ok e1d3s09 2 DA3 110 GiB ok e1d3s10 2 DA1 108 GiB ok e1d3s11 2 DA2 110 GiB ok e1d3s12 2 DA3 110 GiB ok e1d4s07 2 DA1 108 GiB ok e1d4s08 2 DA2 110 GiB ok e1d4s09 2 DA3 110 GiB ok e1d4s10 2 DA1 108 GiB ok e1d4s11 2 DA2 110 GiB ok e1d4s12 2 DA3 110 GiB ok e1d5s07 2 DA1 108 GiB ok e1d5s08 2 DA2 110 GiB ok e1d5s09 2 DA3 110 GiB ok e1d5s10 2 DA3 110 GiB ok e1d5s11 2 DA2 110 GiB ok e1d5s12log 2 LOG 186 GiB ok e2d1s07 2 DA1 106 GiB ok e2d1s08 2 DA2 110 GiB ok e2d1s09 2 DA3 110 GiB ok e2d1s10 2 DA1 108 GiB ok e2d1s11 2 DA2 110 GiB ok e2d1s12 2 DA3 110 GiB ok e2d2s07 2 DA1 108 GiB ok e2d2s08 2 DA2 110 GiB ok e2d2s09 2 DA3 110 GiB ok e2d2s10 2 DA1 108 GiB ok e2d2s11 2 DA2 110 GiB ok e2d2s12 2 DA3 110 GiB ok e2d3s07 2 DA1 108 GiB ok e2d3s08 2 DA2 110 GiB ok e2d3s09 2 DA3 110 GiB ok e2d3s10 2 DA1 108 GiB ok e2d3s11 2 DA2 110 GiB ok e2d3s12 2 DA3 110 GiB ok e2d4s07 2 DA1 108 GiB ok e2d4s08 2 DA2 110 GiB ok e2d4s09 2 DA3 110 GiB ok e2d4s10 2 DA1 108 GiB ok e2d4s11 2 DA2 110 GiB ok e2d4s12 2 DA3 110 GiB ok e2d5s07 2 DA1 108 GiB ok e2d5s08 2 DA2 110 GiB ok e2d5s09 2 DA3 110 GiB ok e2d5s10 2 DA3 110 GiB ok e2d5s11 2 DA2 110 GiB ok e2d5s12log 2 LOG 186 GiB ok e3d1s07 2 DA1 108 GiB ok e3d1s08 2 DA2 110 GiB ok e3d1s09 2 DA3 110 GiB ok e3d1s10 2 DA1 108 GiB ok e3d1s11 2 DA2 110 GiB ok e3d1s12 2 DA3 110 GiB ok e3d2s07 2 DA1 108 GiB ok e3d2s08 2 DA2 110 GiB ok e3d2s09 2 DA3 110 GiB ok e3d2s10 2 DA1 108 GiB ok e3d2s11 2 DA2 110 GiB ok e3d2s12 2 DA3 110 GiB ok e3d3s07 2 DA1 108 GiB ok e3d3s08 2 DA2 110 GiB ok e3d3s09 2 DA3 110 GiB ok e3d3s10 2 DA1 108 GiB ok e3d3s11 2 DA2 110 GiB ok e3d3s12 2 DA3 110 GiB ok e3d4s07 2 DA1 108 GiB ok e3d4s08 2 DA2 110 GiB ok e3d4s09 2 DA3 110 GiB ok e3d4s10 2 DA1 108 GiB ok e3d4s11 2 DA2 110 GiB ok e3d4s12 2 DA3 110 GiB ok e3d5s07 2 DA1 108 GiB ok e3d5s08 2 DA2 110 GiB ok e3d5s09 2 DA3 110 GiB ok e3d5s10 2 DA1 108 GiB ok e3d5s11 2 DA3 110 GiB ok e3d5s12log 2 LOG 186 GiB ok e4d1s07 2 DA1 108 GiB ok e4d1s08 2 DA2 110 GiB ok e4d1s09 2 DA3 110 GiB ok e4d1s10 2 DA1 108 GiB ok e4d1s11 2 DA2 110 GiB ok e4d1s12 2 DA3 110 GiB ok e4d2s07 2 DA1 108 GiB ok e4d2s08 2 DA2 110 GiB ok e4d2s09 2 DA3 110 GiB ok e4d2s10 2 DA1 108 GiB ok e4d2s11 2 DA2 110 GiB ok e4d2s12 2 DA3 110 GiB ok e4d3s07 2 DA1 106 GiB ok e4d3s08 2 DA2 110 GiB ok e4d3s09 2 DA3 110 GiB ok e4d3s10 2 DA1 106 GiB ok e4d3s11 2 DA2 110 GiB ok e4d3s12 2 DA3 110 GiB ok e4d4s07 2 DA1 106 GiB ok e4d4s08 2 DA2 110 GiB ok e4d4s09 2 DA3 110 GiB ok e4d4s10 2 DA1 106 GiB ok e4d4s11 2 DA2 110 GiB ok e4d4s12 2 DA3 110 GiB ok e4d5s07 2 DA1 106 GiB ok e4d5s08 2 DA2 110 GiB ok e4d5s09 2 DA3 110 GiB ok e4d5s10 2 DA1 106 GiB ok e4d5s11 2 DA3 110 GiB ok e5d1s07 2 DA1 106 GiB ok e5d1s08 2 DA2 110 GiB ok e5d1s09 2 DA3 110 GiB ok e5d1s10 2 DA1 106 GiB ok e5d1s11 2 DA2 110 GiB ok e5d1s12 2 DA3 110 GiB ok e5d2s07 2 DA1 106 GiB ok e5d2s08 2 DA2 110 GiB ok e5d2s09 2 DA3 110 GiB ok e5d2s10 2 DA1 106 GiB ok e5d2s11 2 DA2 110 GiB ok e5d2s12 2 DA3 110 GiB ok e5d3s07 2 DA1 106 GiB ok e5d3s08 2 DA2 110 GiB ok e5d3s09 2 DA3 110 GiB ok e5d3s10 2 DA1 106 GiB ok e5d3s11 2 DA2 110 GiB ok e5d3s12 2 DA3 108 GiB ok e5d4s07 2 DA1 106 GiB ok e5d4s08 2 DA2 110 GiB ok e5d4s09 2 DA3 110 GiB ok e5d4s10 2 DA1 106 GiB ok e5d4s11 2 DA2 110 GiB ok e5d4s12 2 DA3 110 GiB ok e5d5s07 2 DA1 106 GiB ok e5d5s08 2 DA2 110 GiB ok e5d5s09 2 DA3 110 GiB ok e5d5s10 2 DA1 106 GiB ok e5d5s11 2 DA2 110 GiB ok e6d1s07 2 DA1 106 GiB ok e6d1s08 2 DA2 110 GiB ok e6d1s09 2 DA3 110 GiB ok e6d1s10 2 DA1 106 GiB ok e6d1s11 2 DA2 110 GiB ok e6d1s12 2 DA3 110 GiB ok e6d2s07 2 DA1 106 GiB ok e6d2s08 2 DA2 110 GiB ok e6d2s09 2 DA3 110 GiB ok e6d2s10 2 DA1 106 GiB ok e6d2s11 2 DA2 110 GiB ok e6d2s12 2 DA3 110 GiB ok e6d3s07 2 DA1 106 GiB ok e6d3s08 2 DA2 108 GiB ok e6d3s09 2 DA3 110 GiB ok e6d3s10 2 DA1 106 GiB ok e6d3s11 2 DA2 108 GiB ok e6d3s12 2 DA3 108 GiB ok e6d4s07 2 DA1 106 GiB ok e6d4s08 2 DA2 108 GiB ok e6d4s09 2 DA3 108 GiB ok e6d4s10 2 DA1 106 GiB ok e6d4s11 2 DA2 108 GiB ok e6d4s12 2 DA3 108 GiB ok e6d5s07 2 DA1 106 GiB ok e6d5s08 2 DA2 110 GiB ok e6d5s09 2 DA3 108 GiB ok e6d5s10 2 DA1 106 GiB ok e6d5s11 2 DA2 108 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss01b_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss01b_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss01b_MetaData_8M_3p_1 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss01b_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss01b_MetaData_8M_3p_3 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss01b_Data_8M_3p_1 8+3p DA1 99 TiB 8 MiB 32 KiB gss01b_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss01b_Data_8M_3p_3 8+3p DA3 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss01b.ebi.ac.uk gss01b.ebi.ac.uk,gss01a.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss02a 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 41% low DA3 no 2 58 2 1 786 GiB 14 days scrub 8% low DA2 no 2 58 2 1 786 GiB 14 days scrub 14% low DA1 no 3 58 2 1 626 GiB 14 days scrub 5% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s01 2 DA3 110 GiB ok e1d1s02 2 DA2 110 GiB ok e1d1s03log 2 LOG 186 GiB ok e1d1s04 2 DA1 108 GiB ok e1d1s05 2 DA2 110 GiB ok e1d1s06 2 DA3 110 GiB ok e1d2s01 2 DA1 108 GiB ok e1d2s02 2 DA2 110 GiB ok e1d2s03 2 DA3 110 GiB ok e1d2s04 2 DA1 108 GiB ok e1d2s05 2 DA2 110 GiB ok e1d2s06 2 DA3 110 GiB ok e1d3s01 2 DA1 108 GiB ok e1d3s02 2 DA2 110 GiB ok e1d3s03 2 DA3 110 GiB ok e1d3s04 2 DA1 108 GiB ok e1d3s05 2 DA2 110 GiB ok e1d3s06 2 DA3 110 GiB ok e1d4s01 2 DA1 108 GiB ok e1d4s02 2 DA2 110 GiB ok e1d4s03 2 DA3 110 GiB ok e1d4s04 2 DA1 108 GiB ok e1d4s05 2 DA2 110 GiB ok e1d4s06 2 DA3 110 GiB ok e1d5s01 2 DA1 108 GiB ok e1d5s02 2 DA2 110 GiB ok e1d5s03 2 DA3 110 GiB ok e1d5s04 2 DA1 108 GiB ok e1d5s05 2 DA2 110 GiB ok e1d5s06 2 DA3 110 GiB ok e2d1s01 2 DA3 110 GiB ok e2d1s02 2 DA2 110 GiB ok e2d1s03log 2 LOG 186 GiB ok e2d1s04 2 DA1 108 GiB ok e2d1s05 2 DA2 110 GiB ok e2d1s06 2 DA3 110 GiB ok e2d2s01 2 DA1 106 GiB ok e2d2s02 2 DA2 110 GiB ok e2d2s03 2 DA3 110 GiB ok e2d2s04 2 DA1 106 GiB ok e2d2s05 2 DA2 110 GiB ok e2d2s06 2 DA3 110 GiB ok e2d3s01 2 DA1 106 GiB ok e2d3s02 2 DA2 110 GiB ok e2d3s03 2 DA3 110 GiB ok e2d3s04 2 DA1 106 GiB ok e2d3s05 2 DA2 110 GiB ok e2d3s06 2 DA3 110 GiB ok e2d4s01 2 DA1 106 GiB ok e2d4s02 2 DA2 110 GiB ok e2d4s03 2 DA3 110 GiB ok e2d4s04 2 DA1 106 GiB ok e2d4s05 2 DA2 110 GiB ok e2d4s06 2 DA3 110 GiB ok e2d5s01 2 DA1 108 GiB ok e2d5s02 2 DA2 110 GiB ok e2d5s03 2 DA3 110 GiB ok e2d5s04 2 DA1 108 GiB ok e2d5s05 2 DA2 110 GiB ok e2d5s06 2 DA3 110 GiB ok e3d1s01 2 DA1 108 GiB ok e3d1s02 2 DA3 110 GiB ok e3d1s03log 2 LOG 186 GiB ok e3d1s04 2 DA1 106 GiB ok e3d1s05 2 DA2 110 GiB ok e3d1s06 2 DA3 110 GiB ok e3d2s01 2 DA1 106 GiB ok e3d2s02 2 DA2 110 GiB ok e3d2s03 2 DA3 110 GiB ok e3d2s04 2 DA1 108 GiB ok e3d2s05 2 DA2 110 GiB ok e3d2s06 2 DA3 110 GiB ok e3d3s01 2 DA1 106 GiB ok e3d3s02 2 DA2 110 GiB ok e3d3s03 2 DA3 110 GiB ok e3d3s04 2 DA1 106 GiB ok e3d3s05 2 DA2 110 GiB ok e3d3s06 2 DA3 110 GiB ok e3d4s01 2 DA1 106 GiB ok e3d4s02 2 DA2 110 GiB ok e3d4s03 2 DA3 110 GiB ok e3d4s04 2 DA1 108 GiB ok e3d4s05 2 DA2 110 GiB ok e3d4s06 2 DA3 110 GiB ok e3d5s01 2 DA1 108 GiB ok e3d5s02 2 DA2 110 GiB ok e3d5s03 2 DA3 110 GiB ok e3d5s04 2 DA1 106 GiB ok e3d5s05 2 DA2 110 GiB ok e3d5s06 2 DA3 110 GiB ok e4d1s01 2 DA1 106 GiB ok e4d1s02 2 DA3 110 GiB ok e4d1s04 2 DA1 106 GiB ok e4d1s05 2 DA2 110 GiB ok e4d1s06 2 DA3 110 GiB ok e4d2s01 2 DA1 106 GiB ok e4d2s02 2 DA2 110 GiB ok e4d2s03 2 DA3 110 GiB ok e4d2s04 2 DA1 106 GiB ok e4d2s05 2 DA2 110 GiB ok e4d2s06 2 DA3 110 GiB ok e4d3s01 2 DA1 108 GiB ok e4d3s02 2 DA2 110 GiB ok e4d3s03 2 DA3 110 GiB ok e4d3s04 2 DA1 108 GiB ok e4d3s05 2 DA2 110 GiB ok e4d3s06 2 DA3 110 GiB ok e4d4s01 2 DA1 106 GiB ok e4d4s02 2 DA2 110 GiB ok e4d4s03 2 DA3 110 GiB ok e4d4s04 2 DA1 106 GiB ok e4d4s05 2 DA2 110 GiB ok e4d4s06 2 DA3 110 GiB ok e4d5s01 2 DA1 106 GiB ok e4d5s02 2 DA2 110 GiB ok e4d5s03 2 DA3 110 GiB ok e4d5s04 2 DA1 106 GiB ok e4d5s05 2 DA2 110 GiB ok e4d5s06 2 DA3 110 GiB ok e5d1s01 2 DA1 108 GiB ok e5d1s02 2 DA2 110 GiB ok e5d1s04 2 DA1 106 GiB ok e5d1s05 2 DA2 110 GiB ok e5d1s06 2 DA3 110 GiB ok e5d2s01 2 DA1 108 GiB ok e5d2s02 2 DA2 110 GiB ok e5d2s03 2 DA3 110 GiB ok e5d2s04 2 DA1 108 GiB ok e5d2s05 2 DA2 110 GiB ok e5d2s06 2 DA3 110 GiB ok e5d3s01 2 DA1 108 GiB ok e5d3s02 2 DA2 110 GiB ok e5d3s03 2 DA3 110 GiB ok e5d3s04 2 DA1 106 GiB ok e5d3s05 2 DA2 110 GiB ok e5d3s06 2 DA3 110 GiB ok e5d4s01 2 DA1 108 GiB ok e5d4s02 2 DA2 110 GiB ok e5d4s03 2 DA3 110 GiB ok e5d4s04 2 DA1 108 GiB ok e5d4s05 2 DA2 110 GiB ok e5d4s06 2 DA3 110 GiB ok e5d5s01 2 DA1 108 GiB ok e5d5s02 2 DA2 110 GiB ok e5d5s03 2 DA3 110 GiB ok e5d5s04 2 DA1 106 GiB ok e5d5s05 2 DA2 110 GiB ok e5d5s06 2 DA3 110 GiB ok e6d1s01 2 DA1 108 GiB ok e6d1s02 2 DA2 110 GiB ok e6d1s04 2 DA1 108 GiB ok e6d1s05 2 DA2 110 GiB ok e6d1s06 2 DA3 110 GiB ok e6d2s01 2 DA1 106 GiB ok e6d2s02 2 DA2 110 GiB ok e6d2s03 2 DA3 110 GiB ok e6d2s04 2 DA1 108 GiB ok e6d2s05 2 DA2 108 GiB ok e6d2s06 2 DA3 110 GiB ok e6d3s01 2 DA1 106 GiB ok e6d3s02 2 DA2 108 GiB ok e6d3s03 2 DA3 110 GiB ok e6d3s04 2 DA1 106 GiB ok e6d3s05 2 DA2 108 GiB ok e6d3s06 2 DA3 108 GiB ok e6d4s01 2 DA1 106 GiB ok e6d4s02 2 DA2 108 GiB ok e6d4s03 2 DA3 108 GiB ok e6d4s04 2 DA1 108 GiB ok e6d4s05 2 DA2 108 GiB ok e6d4s06 2 DA3 108 GiB ok e6d5s01 2 DA1 108 GiB ok e6d5s02 2 DA2 110 GiB ok e6d5s03 2 DA3 108 GiB ok e6d5s04 2 DA1 108 GiB ok e6d5s05 2 DA2 110 GiB ok e6d5s06 2 DA3 108 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss02a_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss02a_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss02a_MetaData_8M_3p_1 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss02a_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss02a_MetaData_8M_3p_3 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss02a_Data_8M_3p_1 8+3p DA3 99 TiB 8 MiB 32 KiB gss02a_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss02a_Data_8M_3p_3 8+3p DA1 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss02a.ebi.ac.uk gss02a.ebi.ac.uk,gss02b.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss02b 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 39% low DA1 no 3 58 2 1 626 GiB 14 days scrub 67% low DA2 no 2 58 2 1 786 GiB 14 days scrub 13% low DA3 no 2 58 2 1 786 GiB 14 days scrub 13% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s07 2 DA1 108 GiB ok e1d1s08 2 DA2 110 GiB ok e1d1s09 2 DA3 110 GiB ok e1d1s10 2 DA1 108 GiB ok e1d1s11 2 DA2 110 GiB ok e1d1s12 2 DA3 110 GiB ok e1d2s07 2 DA1 108 GiB ok e1d2s08 2 DA2 110 GiB ok e1d2s09 2 DA3 110 GiB ok e1d2s10 2 DA1 108 GiB ok e1d2s11 2 DA2 110 GiB ok e1d2s12 2 DA3 110 GiB ok e1d3s07 2 DA1 108 GiB ok e1d3s08 2 DA2 110 GiB ok e1d3s09 2 DA3 110 GiB ok e1d3s10 2 DA1 108 GiB ok e1d3s11 2 DA2 110 GiB ok e1d3s12 2 DA3 110 GiB ok e1d4s07 2 DA1 108 GiB ok e1d4s08 2 DA2 110 GiB ok e1d4s09 2 DA3 110 GiB ok e1d4s10 2 DA1 108 GiB ok e1d4s11 2 DA2 110 GiB ok e1d4s12 2 DA3 110 GiB ok e1d5s07 2 DA1 108 GiB ok e1d5s08 2 DA2 110 GiB ok e1d5s09 2 DA3 110 GiB ok e1d5s10 2 DA3 110 GiB ok e1d5s11 2 DA2 110 GiB ok e1d5s12log 2 LOG 186 GiB ok e2d1s07 2 DA1 108 GiB ok e2d1s08 2 DA2 110 GiB ok e2d1s09 2 DA3 110 GiB ok e2d1s10 2 DA1 108 GiB ok e2d1s11 2 DA2 110 GiB ok e2d1s12 2 DA3 110 GiB ok e2d2s07 2 DA1 108 GiB ok e2d2s08 2 DA2 110 GiB ok e2d2s09 2 DA3 110 GiB ok e2d2s10 2 DA1 108 GiB ok e2d2s11 2 DA2 110 GiB ok e2d2s12 2 DA3 110 GiB ok e2d3s07 2 DA1 108 GiB ok e2d3s08 2 DA2 110 GiB ok e2d3s09 2 DA3 110 GiB ok e2d3s10 2 DA1 108 GiB ok e2d3s11 2 DA2 110 GiB ok e2d3s12 2 DA3 110 GiB ok e2d4s07 2 DA1 108 GiB ok e2d4s08 2 DA2 110 GiB ok e2d4s09 2 DA3 110 GiB ok e2d4s10 2 DA1 108 GiB ok e2d4s11 2 DA2 110 GiB ok e2d4s12 2 DA3 110 GiB ok e2d5s07 2 DA1 108 GiB ok e2d5s08 2 DA2 110 GiB ok e2d5s09 2 DA3 110 GiB ok e2d5s10 2 DA3 110 GiB ok e2d5s11 2 DA2 110 GiB ok e2d5s12log 2 LOG 186 GiB ok e3d1s07 2 DA1 108 GiB ok e3d1s08 2 DA2 110 GiB ok e3d1s09 2 DA3 110 GiB ok e3d1s10 2 DA1 108 GiB ok e3d1s11 2 DA2 110 GiB ok e3d1s12 2 DA3 110 GiB ok e3d2s07 2 DA1 108 GiB ok e3d2s08 2 DA2 110 GiB ok e3d2s09 2 DA3 110 GiB ok e3d2s10 2 DA1 108 GiB ok e3d2s11 2 DA2 110 GiB ok e3d2s12 2 DA3 110 GiB ok e3d3s07 2 DA1 108 GiB ok e3d3s08 2 DA2 110 GiB ok e3d3s09 2 DA3 110 GiB ok e3d3s10 2 DA1 108 GiB ok e3d3s11 2 DA2 110 GiB ok e3d3s12 2 DA3 110 GiB ok e3d4s07 2 DA1 108 GiB ok e3d4s08 2 DA2 110 GiB ok e3d4s09 2 DA3 110 GiB ok e3d4s10 2 DA1 108 GiB ok e3d4s11 2 DA2 110 GiB ok e3d4s12 2 DA3 110 GiB ok e3d5s07 2 DA1 108 GiB ok e3d5s08 2 DA2 110 GiB ok e3d5s09 2 DA3 110 GiB ok e3d5s10 2 DA1 108 GiB ok e3d5s11 2 DA3 110 GiB ok e3d5s12log 2 LOG 186 GiB ok e4d1s07 2 DA1 108 GiB ok e4d1s08 2 DA2 110 GiB ok e4d1s09 2 DA3 110 GiB ok e4d1s10 2 DA1 108 GiB ok e4d1s11 2 DA2 110 GiB ok e4d1s12 2 DA3 110 GiB ok e4d2s07 2 DA1 106 GiB ok e4d2s08 2 DA2 110 GiB ok e4d2s09 2 DA3 110 GiB ok e4d2s10 2 DA1 106 GiB ok e4d2s11 2 DA2 110 GiB ok e4d2s12 2 DA3 110 GiB ok e4d3s07 2 DA1 106 GiB ok e4d3s08 2 DA2 110 GiB ok e4d3s09 2 DA3 110 GiB ok e4d3s10 2 DA1 106 GiB ok e4d3s11 2 DA2 110 GiB ok e4d3s12 2 DA3 110 GiB ok e4d4s07 2 DA1 106 GiB ok e4d4s08 2 DA2 110 GiB ok e4d4s09 2 DA3 110 GiB ok e4d4s10 2 DA1 108 GiB ok e4d4s11 2 DA2 110 GiB ok e4d4s12 2 DA3 110 GiB ok e4d5s07 2 DA1 106 GiB ok e4d5s08 2 DA2 110 GiB ok e4d5s09 2 DA3 110 GiB ok e4d5s10 2 DA1 106 GiB ok e4d5s11 2 DA3 110 GiB ok e5d1s07 2 DA1 106 GiB ok e5d1s08 2 DA2 110 GiB ok e5d1s09 2 DA3 110 GiB ok e5d1s10 2 DA1 106 GiB ok e5d1s11 2 DA2 110 GiB ok e5d1s12 2 DA3 110 GiB ok e5d2s07 2 DA1 106 GiB ok e5d2s08 2 DA2 110 GiB ok e5d2s09 2 DA3 110 GiB ok e5d2s10 2 DA1 106 GiB ok e5d2s11 2 DA2 110 GiB ok e5d2s12 2 DA3 110 GiB ok e5d3s07 2 DA1 106 GiB ok e5d3s08 2 DA2 110 GiB ok e5d3s09 2 DA3 110 GiB ok e5d3s10 2 DA1 106 GiB ok e5d3s11 2 DA2 110 GiB ok e5d3s12 2 DA3 110 GiB ok e5d4s07 2 DA1 106 GiB ok e5d4s08 2 DA2 110 GiB ok e5d4s09 2 DA3 110 GiB ok e5d4s10 2 DA1 106 GiB ok e5d4s11 2 DA2 110 GiB ok e5d4s12 2 DA3 110 GiB ok e5d5s07 2 DA1 106 GiB ok e5d5s08 2 DA2 110 GiB ok e5d5s09 2 DA3 110 GiB ok e5d5s10 2 DA1 106 GiB ok e5d5s11 2 DA2 110 GiB ok e6d1s07 2 DA1 106 GiB ok e6d1s08 2 DA2 110 GiB ok e6d1s09 2 DA3 110 GiB ok e6d1s10 2 DA1 106 GiB ok e6d1s11 2 DA2 110 GiB ok e6d1s12 2 DA3 110 GiB ok e6d2s07 2 DA1 106 GiB ok e6d2s08 2 DA2 110 GiB ok e6d2s09 2 DA3 108 GiB ok e6d2s10 2 DA1 106 GiB ok e6d2s11 2 DA2 108 GiB ok e6d2s12 2 DA3 108 GiB ok e6d3s07 2 DA1 106 GiB ok e6d3s08 2 DA2 108 GiB ok e6d3s09 2 DA3 108 GiB ok e6d3s10 2 DA1 106 GiB ok e6d3s11 2 DA2 108 GiB ok e6d3s12 2 DA3 108 GiB ok e6d4s07 2 DA1 106 GiB ok e6d4s08 2 DA2 108 GiB ok e6d4s09 2 DA3 108 GiB ok e6d4s10 2 DA1 106 GiB ok e6d4s11 2 DA2 108 GiB ok e6d4s12 2 DA3 110 GiB ok e6d5s07 2 DA1 106 GiB ok e6d5s08 2 DA2 110 GiB ok e6d5s09 2 DA3 110 GiB ok e6d5s10 2 DA1 106 GiB ok e6d5s11 2 DA2 110 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss02b_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss02b_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss02b_MetaData_8M_3p_1 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss02b_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss02b_MetaData_8M_3p_3 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss02b_Data_8M_3p_1 8+3p DA1 99 TiB 8 MiB 32 KiB gss02b_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss02b_Data_8M_3p_3 8+3p DA3 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss02b.ebi.ac.uk gss02b.ebi.ac.uk,gss02a.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss03a 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 36% low DA3 no 2 58 2 1 786 GiB 14 days scrub 18% low DA2 no 2 58 2 1 786 GiB 14 days scrub 19% low DA1 no 3 58 2 1 626 GiB 14 days scrub 4% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s01 2 DA3 110 GiB ok e1d1s02 2 DA2 110 GiB ok e1d1s03log 2 LOG 186 GiB ok e1d1s04 2 DA1 108 GiB ok e1d1s05 2 DA2 110 GiB ok e1d1s06 2 DA3 110 GiB ok e1d2s01 2 DA1 108 GiB ok e1d2s02 2 DA2 110 GiB ok e1d2s03 2 DA3 110 GiB ok e1d2s04 2 DA1 108 GiB ok e1d2s05 2 DA2 110 GiB ok e1d2s06 2 DA3 110 GiB ok e1d3s01 2 DA1 108 GiB ok e1d3s02 2 DA2 110 GiB ok e1d3s03 2 DA3 110 GiB ok e1d3s04 2 DA1 108 GiB ok e1d3s05 2 DA2 110 GiB ok e1d3s06 2 DA3 110 GiB ok e1d4s01 2 DA1 108 GiB ok e1d4s02 2 DA2 110 GiB ok e1d4s03 2 DA3 110 GiB ok e1d4s04 2 DA1 108 GiB ok e1d4s05 2 DA2 110 GiB ok e1d4s06 2 DA3 110 GiB ok e1d5s01 2 DA1 108 GiB ok e1d5s02 2 DA2 110 GiB ok e1d5s03 2 DA3 110 GiB ok e1d5s04 2 DA1 108 GiB ok e1d5s05 2 DA2 110 GiB ok e1d5s06 2 DA3 110 GiB ok e2d1s01 2 DA3 110 GiB ok e2d1s02 2 DA2 110 GiB ok e2d1s03log 2 LOG 186 GiB ok e2d1s04 2 DA1 108 GiB ok e2d1s05 2 DA2 110 GiB ok e2d1s06 2 DA3 110 GiB ok e2d2s01 2 DA1 108 GiB ok e2d2s02 2 DA2 110 GiB ok e2d2s03 2 DA3 110 GiB ok e2d2s04 2 DA1 108 GiB ok e2d2s05 2 DA2 110 GiB ok e2d2s06 2 DA3 110 GiB ok e2d3s01 2 DA1 108 GiB ok e2d3s02 2 DA2 110 GiB ok e2d3s03 2 DA3 110 GiB ok e2d3s04 2 DA1 108 GiB ok e2d3s05 2 DA2 110 GiB ok e2d3s06 2 DA3 110 GiB ok e2d4s01 2 DA1 108 GiB ok e2d4s02 2 DA2 110 GiB ok e2d4s03 2 DA3 110 GiB ok e2d4s04 2 DA1 108 GiB ok e2d4s05 2 DA2 110 GiB ok e2d4s06 2 DA3 110 GiB ok e2d5s01 2 DA1 108 GiB ok e2d5s02 2 DA2 110 GiB ok e2d5s03 2 DA3 110 GiB ok e2d5s04 2 DA1 108 GiB ok e2d5s05 2 DA2 110 GiB ok e2d5s06 2 DA3 110 GiB ok e3d1s01 2 DA1 108 GiB ok e3d1s02 2 DA3 110 GiB ok e3d1s03log 2 LOG 186 GiB ok e3d1s04 2 DA1 108 GiB ok e3d1s05 2 DA2 110 GiB ok e3d1s06 2 DA3 110 GiB ok e3d2s01 2 DA1 108 GiB ok e3d2s02 2 DA2 110 GiB ok e3d2s03 2 DA3 110 GiB ok e3d2s04 2 DA1 108 GiB ok e3d2s05 2 DA2 110 GiB ok e3d2s06 2 DA3 110 GiB ok e3d3s01 2 DA1 108 GiB ok e3d3s02 2 DA2 110 GiB ok e3d3s03 2 DA3 110 GiB ok e3d3s04 2 DA1 108 GiB ok e3d3s05 2 DA2 110 GiB ok e3d3s06 2 DA3 110 GiB ok e3d4s01 2 DA1 108 GiB ok e3d4s02 2 DA2 110 GiB ok e3d4s03 2 DA3 110 GiB ok e3d4s04 2 DA1 108 GiB ok e3d4s05 2 DA2 110 GiB ok e3d4s06 2 DA3 110 GiB ok e3d5s01 2 DA1 108 GiB ok e3d5s02 2 DA2 110 GiB ok e3d5s03 2 DA3 110 GiB ok e3d5s04 2 DA1 108 GiB ok e3d5s05 2 DA2 110 GiB ok e3d5s06 2 DA3 110 GiB ok e4d1s01 2 DA1 108 GiB ok e4d1s02 2 DA3 110 GiB ok e4d1s04 2 DA1 108 GiB ok e4d1s05 2 DA2 110 GiB ok e4d1s06 2 DA3 110 GiB ok e4d2s01 2 DA1 108 GiB ok e4d2s02 2 DA2 110 GiB ok e4d2s03 2 DA3 110 GiB ok e4d2s04 2 DA1 106 GiB ok e4d2s05 2 DA2 110 GiB ok e4d2s06 2 DA3 110 GiB ok e4d3s01 2 DA1 106 GiB ok e4d3s02 2 DA2 110 GiB ok e4d3s03 2 DA3 110 GiB ok e4d3s04 2 DA1 106 GiB ok e4d3s05 2 DA2 110 GiB ok e4d3s06 2 DA3 110 GiB ok e4d4s01 2 DA1 106 GiB ok e4d4s02 2 DA2 110 GiB ok e4d4s03 2 DA3 110 GiB ok e4d4s04 2 DA1 106 GiB ok e4d4s05 2 DA2 110 GiB ok e4d4s06 2 DA3 110 GiB ok e4d5s01 2 DA1 106 GiB ok e4d5s02 2 DA2 110 GiB ok e4d5s03 2 DA3 110 GiB ok e4d5s04 2 DA1 106 GiB ok e4d5s05 2 DA2 110 GiB ok e4d5s06 2 DA3 110 GiB ok e5d1s01 2 DA1 106 GiB ok e5d1s02 2 DA2 110 GiB ok e5d1s04 2 DA1 106 GiB ok e5d1s05 2 DA2 110 GiB ok e5d1s06 2 DA3 110 GiB ok e5d2s01 2 DA1 106 GiB ok e5d2s02 2 DA2 110 GiB ok e5d2s03 2 DA3 110 GiB ok e5d2s04 2 DA1 106 GiB ok e5d2s05 2 DA2 110 GiB ok e5d2s06 2 DA3 110 GiB ok e5d3s01 2 DA1 106 GiB ok e5d3s02 2 DA2 110 GiB ok e5d3s03 2 DA3 110 GiB ok e5d3s04 2 DA1 106 GiB ok e5d3s05 2 DA2 110 GiB ok e5d3s06 2 DA3 110 GiB ok e5d4s01 2 DA1 106 GiB ok e5d4s02 2 DA2 110 GiB ok e5d4s03 2 DA3 110 GiB ok e5d4s04 2 DA1 106 GiB ok e5d4s05 2 DA2 110 GiB ok e5d4s06 2 DA3 110 GiB ok e5d5s01 2 DA1 106 GiB ok e5d5s02 2 DA2 110 GiB ok e5d5s03 2 DA3 110 GiB ok e5d5s04 2 DA1 106 GiB ok e5d5s05 2 DA2 110 GiB ok e5d5s06 2 DA3 110 GiB ok e6d1s01 2 DA1 106 GiB ok e6d1s02 2 DA2 110 GiB ok e6d1s04 2 DA1 106 GiB ok e6d1s05 2 DA2 110 GiB ok e6d1s06 2 DA3 110 GiB ok e6d2s01 2 DA1 106 GiB ok e6d2s02 2 DA2 110 GiB ok e6d2s03 2 DA3 110 GiB ok e6d2s04 2 DA1 106 GiB ok e6d2s05 2 DA2 108 GiB ok e6d2s06 2 DA3 108 GiB ok e6d3s01 2 DA1 106 GiB ok e6d3s02 2 DA2 108 GiB ok e6d3s03 2 DA3 108 GiB ok e6d3s04 2 DA1 106 GiB ok e6d3s05 2 DA2 108 GiB ok e6d3s06 2 DA3 108 GiB ok e6d4s01 2 DA1 106 GiB ok e6d4s02 2 DA2 108 GiB ok e6d4s03 2 DA3 108 GiB ok e6d4s04 2 DA1 106 GiB ok e6d4s05 2 DA2 108 GiB ok e6d4s06 2 DA3 108 GiB ok e6d5s01 2 DA1 106 GiB ok e6d5s02 2 DA2 110 GiB ok e6d5s03 2 DA3 110 GiB ok e6d5s04 2 DA1 106 GiB ok e6d5s05 2 DA2 110 GiB ok e6d5s06 2 DA3 110 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss03a_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss03a_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss03a_MetaData_8M_3p_1 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss03a_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss03a_MetaData_8M_3p_3 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss03a_Data_8M_3p_1 8+3p DA3 99 TiB 8 MiB 32 KiB gss03a_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss03a_Data_8M_3p_3 8+3p DA1 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss03a.ebi.ac.uk gss03a.ebi.ac.uk,gss03b.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss03b 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 38% low DA1 no 3 58 2 1 626 GiB 14 days scrub 12% low DA2 no 2 58 2 1 786 GiB 14 days scrub 20% low DA3 no 2 58 2 1 786 GiB 14 days scrub 19% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s07 2 DA1 108 GiB ok e1d1s08 2 DA2 110 GiB ok e1d1s09 2 DA3 110 GiB ok e1d1s10 2 DA1 108 GiB ok e1d1s11 2 DA2 110 GiB ok e1d1s12 2 DA3 110 GiB ok e1d2s07 2 DA1 108 GiB ok e1d2s08 2 DA2 110 GiB ok e1d2s09 2 DA3 110 GiB ok e1d2s10 2 DA1 108 GiB ok e1d2s11 2 DA2 110 GiB ok e1d2s12 2 DA3 110 GiB ok e1d3s07 2 DA1 108 GiB ok e1d3s08 2 DA2 110 GiB ok e1d3s09 2 DA3 110 GiB ok e1d3s10 2 DA1 108 GiB ok e1d3s11 2 DA2 110 GiB ok e1d3s12 2 DA3 110 GiB ok e1d4s07 2 DA1 108 GiB ok e1d4s08 2 DA2 110 GiB ok e1d4s09 2 DA3 110 GiB ok e1d4s10 2 DA1 108 GiB ok e1d4s11 2 DA2 110 GiB ok e1d4s12 2 DA3 110 GiB ok e1d5s07 2 DA1 108 GiB ok e1d5s08 2 DA2 110 GiB ok e1d5s09 2 DA3 110 GiB ok e1d5s10 2 DA3 110 GiB ok e1d5s11 2 DA2 110 GiB ok e1d5s12log 2 LOG 186 GiB ok e2d1s07 2 DA1 108 GiB ok e2d1s08 2 DA2 110 GiB ok e2d1s09 2 DA3 110 GiB ok e2d1s10 2 DA1 106 GiB ok e2d1s11 2 DA2 110 GiB ok e2d1s12 2 DA3 110 GiB ok e2d2s07 2 DA1 106 GiB ok e2d2s08 2 DA2 110 GiB ok e2d2s09 2 DA3 110 GiB ok e2d2s10 2 DA1 106 GiB ok e2d2s11 2 DA2 110 GiB ok e2d2s12 2 DA3 110 GiB ok e2d3s07 2 DA1 106 GiB ok e2d3s08 2 DA2 110 GiB ok e2d3s09 2 DA3 110 GiB ok e2d3s10 2 DA1 106 GiB ok e2d3s11 2 DA2 110 GiB ok e2d3s12 2 DA3 110 GiB ok e2d4s07 2 DA1 106 GiB ok e2d4s08 2 DA2 110 GiB ok e2d4s09 2 DA3 110 GiB ok e2d4s10 2 DA1 108 GiB ok e2d4s11 2 DA2 110 GiB ok e2d4s12 2 DA3 110 GiB ok e2d5s07 2 DA1 108 GiB ok e2d5s08 2 DA2 110 GiB ok e2d5s09 2 DA3 110 GiB ok e2d5s10 2 DA3 110 GiB ok e2d5s11 2 DA2 110 GiB ok e2d5s12log 2 LOG 186 GiB ok e3d1s07 2 DA1 108 GiB ok e3d1s08 2 DA2 110 GiB ok e3d1s09 2 DA3 110 GiB ok e3d1s10 2 DA1 106 GiB ok e3d1s11 2 DA2 110 GiB ok e3d1s12 2 DA3 110 GiB ok e3d2s07 2 DA1 106 GiB ok e3d2s08 2 DA2 110 GiB ok e3d2s09 2 DA3 110 GiB ok e3d2s10 2 DA1 108 GiB ok e3d2s11 2 DA2 110 GiB ok e3d2s12 2 DA3 110 GiB ok e3d3s07 2 DA1 106 GiB ok e3d3s08 2 DA2 110 GiB ok e3d3s09 2 DA3 110 GiB ok e3d3s10 2 DA1 106 GiB ok e3d3s11 2 DA2 110 GiB ok e3d3s12 2 DA3 110 GiB ok e3d4s07 2 DA1 106 GiB ok e3d4s08 2 DA2 110 GiB ok e3d4s09 2 DA3 110 GiB ok e3d4s10 2 DA1 108 GiB ok e3d4s11 2 DA2 110 GiB ok e3d4s12 2 DA3 110 GiB ok e3d5s07 2 DA1 108 GiB ok e3d5s08 2 DA2 110 GiB ok e3d5s09 2 DA3 110 GiB ok e3d5s10 2 DA1 106 GiB ok e3d5s11 2 DA3 110 GiB ok e3d5s12log 2 LOG 186 GiB ok e4d1s07 2 DA1 106 GiB ok e4d1s08 2 DA2 110 GiB ok e4d1s09 2 DA3 110 GiB ok e4d1s10 2 DA1 106 GiB ok e4d1s11 2 DA2 110 GiB ok e4d1s12 2 DA3 110 GiB ok e4d2s07 2 DA1 106 GiB ok e4d2s08 2 DA2 110 GiB ok e4d2s09 2 DA3 110 GiB ok e4d2s10 2 DA1 106 GiB ok e4d2s11 2 DA2 110 GiB ok e4d2s12 2 DA3 110 GiB ok e4d3s07 2 DA1 108 GiB ok e4d3s08 2 DA2 110 GiB ok e4d3s09 2 DA3 110 GiB ok e4d3s10 2 DA1 108 GiB ok e4d3s11 2 DA2 110 GiB ok e4d3s12 2 DA3 110 GiB ok e4d4s07 2 DA1 106 GiB ok e4d4s08 2 DA2 110 GiB ok e4d4s09 2 DA3 110 GiB ok e4d4s10 2 DA1 106 GiB ok e4d4s11 2 DA2 110 GiB ok e4d4s12 2 DA3 110 GiB ok e4d5s07 2 DA1 106 GiB ok e4d5s08 2 DA2 110 GiB ok e4d5s09 2 DA3 110 GiB ok e4d5s10 2 DA1 106 GiB ok e4d5s11 2 DA3 110 GiB ok e5d1s07 2 DA1 108 GiB ok e5d1s08 2 DA2 110 GiB ok e5d1s09 2 DA3 110 GiB ok e5d1s10 2 DA1 106 GiB ok e5d1s11 2 DA2 110 GiB ok e5d1s12 2 DA3 110 GiB ok e5d2s07 2 DA1 108 GiB ok e5d2s08 2 DA2 110 GiB ok e5d2s09 2 DA3 110 GiB ok e5d2s10 2 DA1 108 GiB ok e5d2s11 2 DA2 110 GiB ok e5d2s12 2 DA3 110 GiB ok e5d3s07 2 DA1 108 GiB ok e5d3s08 2 DA2 110 GiB ok e5d3s09 2 DA3 110 GiB ok e5d3s10 2 DA1 106 GiB ok e5d3s11 2 DA2 110 GiB ok e5d3s12 2 DA3 110 GiB ok e5d4s07 2 DA1 108 GiB ok e5d4s08 2 DA2 110 GiB ok e5d4s09 2 DA3 110 GiB ok e5d4s10 2 DA1 108 GiB ok e5d4s11 2 DA2 110 GiB ok e5d4s12 2 DA3 110 GiB ok e5d5s07 2 DA1 108 GiB ok e5d5s08 2 DA2 110 GiB ok e5d5s09 2 DA3 110 GiB ok e5d5s10 2 DA1 106 GiB ok e5d5s11 2 DA2 110 GiB ok e6d1s07 2 DA1 108 GiB ok e6d1s08 2 DA2 110 GiB ok e6d1s09 2 DA3 110 GiB ok e6d1s10 2 DA1 108 GiB ok e6d1s11 2 DA2 110 GiB ok e6d1s12 2 DA3 110 GiB ok e6d2s07 2 DA1 106 GiB ok e6d2s08 2 DA2 110 GiB ok e6d2s09 2 DA3 108 GiB ok e6d2s10 2 DA1 108 GiB ok e6d2s11 2 DA2 108 GiB ok e6d2s12 2 DA3 108 GiB ok e6d3s07 2 DA1 106 GiB ok e6d3s08 2 DA2 108 GiB ok e6d3s09 2 DA3 108 GiB ok e6d3s10 2 DA1 106 GiB ok e6d3s11 2 DA2 108 GiB ok e6d3s12 2 DA3 108 GiB ok e6d4s07 2 DA1 106 GiB ok e6d4s08 2 DA2 108 GiB ok e6d4s09 2 DA3 108 GiB ok e6d4s10 2 DA1 108 GiB ok e6d4s11 2 DA2 108 GiB ok e6d4s12 2 DA3 110 GiB ok e6d5s07 2 DA1 108 GiB ok e6d5s08 2 DA2 110 GiB ok e6d5s09 2 DA3 110 GiB ok e6d5s10 2 DA1 108 GiB ok e6d5s11 2 DA2 110 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss03b_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss03b_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss03b_MetaData_8M_3p_1 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss03b_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss03b_MetaData_8M_3p_3 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss03b_Data_8M_3p_1 8+3p DA1 99 TiB 8 MiB 32 KiB gss03b_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss03b_Data_8M_3p_3 8+3p DA3 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss03b.ebi.ac.uk gss03b.ebi.ac.uk,gss03a.ebi.ac.uk -------------- next part -------------- === mmdiag: config === allowDeleteAclOnChmod 1 assertOnStructureError 0 atimeDeferredSeconds 86400 ! cipherList AUTHONLY ! clusterId 17987981184946329605 ! clusterName GSS.ebi.ac.uk consoleLogEvents 0 dataStructureDump 1 /tmp/mmfs dataStructureDumpOnRGOpenFailed 0 /tmp/mmfs dataStructureDumpOnSGPanic 0 /tmp/mmfs dataStructureDumpWait 60 dbBlockSizeThreshold -1 distributedTokenServer 1 dmapiAllowMountOnWindows 1 dmapiDataEventRetry 2 dmapiEnable 1 dmapiEventBuffers 64 dmapiEventTimeout -1 ! dmapiFileHandleSize 32 dmapiMountEvent all dmapiMountTimeout 60 dmapiSessionFailureTimeout 0 dmapiWorkerThreads 12 enableIPv6 0 enableLowspaceEvents 0 enableNFSCluster 0 enableStatUIDremap 0 enableTreeBasedQuotas 0 enableUIDremap 0 encryptionCryptoEngineLibName (NULL) encryptionCryptoEngineType CLiC enforceFilesetQuotaOnRoot 0 envVar ! failureDetectionTime 60 fgdlActivityTimeWindow 10 fgdlLeaveThreshold 1000 fineGrainDirLocks 1 FIPS1402mode 0 FleaDisableIntegrityChecks 0 FleaNumAsyncIOThreads 2 FleaNumLEBBuffers 256 FleaPreferredStripSize 0 ! flushedDataTarget 1024 ! flushedInodeTarget 1024 healthCheckInterval 10 idleSocketTimeout 3600 ignorePrefetchLUNCount 0 ignoreReplicaSpaceOnStat 0 ignoreReplicationForQuota 0 ignoreReplicationOnStatfs 0 ! ioHistorySize 65536 iscanPrefetchAggressiveness 2 leaseDMSTimeout -1 leaseDuration -1 leaseRecoveryWait 35 ! logBufferCount 20 ! logWrapAmountPct 2 ! logWrapThreads 128 lrocChecksum 0 lrocData 1 lrocDataMaxBufferSize 32768 lrocDataMaxFileSize 32768 lrocDataStubFileSize 0 lrocDeviceMaxSectorsKB 64 lrocDeviceNrRequests 1024 lrocDeviceQueueDepth 31 lrocDevices lrocDeviceScheduler deadline lrocDeviceSetParams 1 lrocDirectories 1 lrocInodes 1 ! maxAllocRegionsPerNode 32 ! maxBackgroundDeletionThreads 16 ! maxblocksize 16777216 ! maxBufferCleaners 1024 ! maxBufferDescs 2097152 maxDiskAddrBuffs -1 maxFcntlRangesPerFile 200 ! maxFileCleaners 1024 maxFileNameBytes 255 ! maxFilesToCache 12288 ! maxGeneralThreads 1280 ! maxInodeDeallocPrefetch 128 ! maxMBpS 16000 maxMissedPingTimeout 60 ! maxReceiverThreads 128 ! maxStatCache 512 maxTokenServers 128 minMissedPingTimeout 3 minQuorumNodes 1 ! minReleaseLevel 1340 ! myNodeConfigNumber 5 noSpaceEventInterval 120 nsdBufSpace (% of PagePool) 30 ! nsdClientCksumTypeLocal NsdCksum_Ck64 ! nsdClientCksumTypeRemote NsdCksum_Ck64 nsdDumpBuffersOnCksumError 0 nsd_cksum_capture ! nsdInlineWriteMax 32768 ! nsdMaxWorkerThreads 3072 ! nsdMinWorkerThreads 3072 nsdMultiQueue 256 nsdRAIDAllowTraditionalNSD 0 nsdRAIDAULogColocationLimit 131072 nsdRAIDBackgroundMinPct 5 ! nsdRAIDBlockDeviceMaxSectorsKB 4096 ! nsdRAIDBlockDeviceNrRequests 32 ! nsdRAIDBlockDeviceQueueDepth 16 ! nsdRAIDBlockDeviceScheduler deadline ! nsdRAIDBufferPoolSizePct (% of PagePool) 80 nsdRAIDBuffersPromotionThresholdPct 50 nsdRAIDCreateVdiskThreads 8 nsdRAIDDiskDiscoveryInterval 180 ! nsdRAIDEventLogToConsole all ! nsdRAIDFastWriteFSDataLimit 65536 ! nsdRAIDFastWriteFSMetadataLimit 262144 ! nsdRAIDFlusherBuffersLimitPct 80 ! nsdRAIDFlusherBuffersLowWatermarkPct 20 ! nsdRAIDFlusherFWLogHighWatermarkMB 1000 ! nsdRAIDFlusherFWLogLimitMB 5000 ! nsdRAIDFlusherThreadsHighWatermark 512 ! nsdRAIDFlusherThreadsLowWatermark 1 ! nsdRAIDFlusherTracksLimitPct 80 ! nsdRAIDFlusherTracksLowWatermarkPct 20 nsdRAIDForegroundMinPct 15 ! nsdRAIDMaxTransientStale2FT 1 ! nsdRAIDMaxTransientStale3FT 1 nsdRAIDMediumWriteLimitPct 50 nsdRAIDMultiQueue -1 ! nsdRAIDReconstructAggressiveness 1 ! nsdRAIDSmallBufferSize 262144 ! nsdRAIDSmallThreadRatio 2 ! nsdRAIDThreadsPerQueue 16 ! nsdRAIDTracks 131072 ! numaMemoryInterleave yes opensslLibName /usr/lib64/libssl.so.10:/usr/lib64/libssl.so.6:/usr/lib64/libssl.so.0.9.8:/lib64/libssl.so.6:libssl.so:libssl.so.0:libssl.so.4 ! pagepool 40802189312 pagepoolMaxPhysMemPct 75 prefetchAggressiveness 2 prefetchAggressivenessRead -1 prefetchAggressivenessWrite -1 ! prefetchPct 5 prefetchThreads 72 readReplicaPolicy default remoteMountTimeout 10 sharedMemLimit 0 sharedMemReservePct 15 sidAutoMapRangeLength 15000000 sidAutoMapRangeStart 15000000 ! socketMaxListenConnections 1500 socketRcvBufferSize 0 socketSndBufferSize 0 statCacheDirPct 10 subnets ! syncWorkerThreads 256 tiebreaker system tiebreakerDisks tokenMemLimit 536870912 treatOSyncLikeODSync 1 tscTcpPort 1191 ! tscWorkerPool 64 uidDomain GSS.ebi.ac.uk uidExpiration 36000 unmountOnDiskFail no useDIOXW 1 usePersistentReserve 0 verbsLibName libibverbs.so verbsPorts verbsRdma disable verbsRdmaCm disable verbsRdmaCmLibName librdmacm.so verbsRdmaMaxSendBytes 16777216 verbsRdmaMinBytes 8192 verbsRdmaQpRtrMinRnrTimer 18 verbsRdmaQpRtrPathMtu 2048 verbsRdmaQpRtrSl 0 verbsRdmaQpRtrSlDynamic 0 verbsRdmaQpRtrSlDynamicTimeout 10 verbsRdmaQpRtsRetryCnt 6 verbsRdmaQpRtsRnrRetry 6 verbsRdmaQpRtsTimeout 18 verbsRdmaSend 0 verbsRdmasPerConnection 8 verbsRdmasPerNode 0 verbsRdmaTimeout 18 verifyGpfsReady 0 ! worker1Threads 1024 ! worker3Threads 32 writebehindThreshold 524288 From oehmes at us.ibm.com Tue Oct 14 18:23:50 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 10:23:50 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: <543D51B6.3070602@ebi.ac.uk> References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> <543D51B6.3070602@ebi.ac.uk> Message-ID: you basically run GSS 1.0 code , while in the current version is GSS 2.0 (which replaced Version 1.5 2 month ago) GSS 1.5 and 2.0 have several enhancements in this space so i strongly encourage you to upgrade your systems. if you can specify a bit what your workload is there might also be additional knobs we can turn to change the behavior. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM: > From: Salvatore Di Nardo > To: gpfsug main discussion list > Date: 10/14/2014 09:40 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > > Thanks in advance for your help. > > We have 6 RG: > recovery group vdisks vdisks servers > ------------------ ----------- ------ ------- > gss01a 4 8 gss01a.ebi.ac.uk,gss01b.ebi.ac.uk > gss01b 4 8 gss01b.ebi.ac.uk,gss01a.ebi.ac.uk > gss02a 4 8 gss02a.ebi.ac.uk,gss02b.ebi.ac.uk > gss02b 4 8 gss02b.ebi.ac.uk,gss02a.ebi.ac.uk > gss03a 4 8 gss03a.ebi.ac.uk,gss03b.ebi.ac.uk > gss03b 4 8 gss03b.ebi.ac.uk,gss03a.ebi.ac.uk > > Check the attached file for RG details. > Following mmlsconfig: > [root at gss01a ~]# mmlsconfig > Configuration data for cluster GSS.ebi.ac.uk: > --------------------------------------------- > myNodeConfigNumber 1 > clusterName GSS.ebi.ac.uk > clusterId 17987981184946329605 > autoload no > dmapiFileHandleSize 32 > minReleaseLevel 3.5.0.11 > [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] > pagepool 38g > nsdRAIDBufferPoolSizePct 80 > maxBufferDescs 2m > numaMemoryInterleave yes > prefetchPct 5 > maxblocksize 16m > nsdRAIDTracks 128k > ioHistorySize 64k > nsdRAIDSmallBufferSize 256k > nsdMaxWorkerThreads 3k > nsdMinWorkerThreads 3k > nsdRAIDSmallThreadRatio 2 > nsdRAIDThreadsPerQueue 16 > nsdClientCksumTypeLocal ck64 > nsdClientCksumTypeRemote ck64 > nsdRAIDEventLogToConsole all > nsdRAIDFastWriteFSDataLimit 64k > nsdRAIDFastWriteFSMetadataLimit 256k > nsdRAIDReconstructAggressiveness 1 > nsdRAIDFlusherBuffersLowWatermarkPct 20 > nsdRAIDFlusherBuffersLimitPct 80 > nsdRAIDFlusherTracksLowWatermarkPct 20 > nsdRAIDFlusherTracksLimitPct 80 > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > nsdRAIDFlusherFWLogLimitMB 5000 > nsdRAIDFlusherThreadsLowWatermark 1 > nsdRAIDFlusherThreadsHighWatermark 512 > nsdRAIDBlockDeviceMaxSectorsKB 4096 > nsdRAIDBlockDeviceNrRequests 32 > nsdRAIDBlockDeviceQueueDepth 16 > nsdRAIDBlockDeviceScheduler deadline > nsdRAIDMaxTransientStale2FT 1 > nsdRAIDMaxTransientStale3FT 1 > syncWorkerThreads 256 > tscWorkerPool 64 > nsdInlineWriteMax 32k > maxFilesToCache 12k > maxStatCache 512 > maxGeneralThreads 1280 > flushedDataTarget 1024 > flushedInodeTarget 1024 > maxFileCleaners 1024 > maxBufferCleaners 1024 > logBufferCount 20 > logWrapAmountPct 2 > logWrapThreads 128 > maxAllocRegionsPerNode 32 > maxBackgroundDeletionThreads 16 > maxInodeDeallocPrefetch 128 > maxMBpS 16000 > maxReceiverThreads 128 > worker1Threads 1024 > worker3Threads 32 > [common] > cipherList AUTHONLY > socketMaxListenConnections 1500 > failureDetectionTime 60 > [common] > adminMode central > > File systems in cluster GSS.ebi.ac.uk: > -------------------------------------- > /dev/gpfs1 > For more configuration paramenters i also attached a file with the > complete output of mmdiag --config. > > > and mmlsfs: > > File system attributes for /dev/gpfs1: > ====================================== > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 32768 Minimum fragment size > in bytes (system pool) > 262144 Minimum fragment size > in bytes (other pools) > -i 512 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 2 Default number of > metadata replicas > -M 2 Maximum number of > metadata replicas > -r 1 Default number of data replicas > -R 2 Maximum number of data replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in effect > -k all ACL semantics in effect > -n 1000 Estimated number of > nodes that will mount file system > -B 1048576 Block size (system pool) > 8388608 Block size (other pools) > -Q user;group;fileset Quotas enforced > user;group;fileset Default quotas enabled > --filesetdf no Fileset df enabled? > -V 13.23 (3.5.0.7) File system version > --create-time Tue Mar 18 16:01:24 2014 File system creation time > -u yes Support for large LUNs? > -z no Is DMAPI enabled? > -L 4194304 Logfile size > -E yes Exact mtime mount option > -S yes Suppress atime mount option > -K whenpossible Strict replica allocation option > --fastea yes Fast external attributes enabled? > --inode-limit 134217728 Maximum number of inodes > -P system;data Disk storage pools in file system > -d > gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; > -d > gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; > -d > gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; > -d > gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; > -d > gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 > Disks in file system > --perfileset-quota no Per-fileset quota enforcement > -A yes Automatic mount option > -o none Additional mount options > -T /gpfs1 Default mount point > --mount-priority 0 Mount priority > > > Regards, > Salvatore > > On 14/10/14 17:22, Sven Oehme wrote: > your GSS code version is very backlevel. > > can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk > as well as mmlsconfig and mmlsfs all > > thx. Sven > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug-discuss at gpfsug.org > Date: 10/14/2014 08:23 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > > > > > On 14/10/14 15:51, Sven Oehme wrote: > it means there is contention on inserting data into the fast write > log on the GSS Node, which could be config or workload related > what GSS code version are you running > [root at ebi5-251 ~]# mmdiag --version > > === mmdiag: version === > Current GPFS build: "3.5.0-11 efix1 (888041)". > Built on Jul 9 2013 at 18:03:32 > Running 6 days 2 hours 10 minutes 35 secs > > > > and how are the nodes connected with each other (Ethernet or IB) ? > ethernet. they use the same bonding (4x10Gb/s) where the data is > passing. We don't have admin dedicated network > > [root at gss03a ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager > > > Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different > subnet because of datacenter constraints ( They are not physically > in the same row, and due to network constraints was not possible to > put them in the same subnet). The packets are routed, but should not > be a problem as there is 160Gb/s bandwidth between them. > > Regards, > Salvatore > > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug main discussion list > Date: 10/14/2014 07:40 AM > Subject: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > > > > hello all, > could someone explain me the meaning of those waiters? > > gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > > Does it means that the vdisk logs are struggling? > > Regards, > Salvatore > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/ > IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Tue Oct 14 18:32:50 2014 From: zgiles at gmail.com (Zachary Giles) Date: Tue, 14 Oct 2014 13:32:50 -0400 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> <543D51B6.3070602@ebi.ac.uk> Message-ID: Except that AFAIK no one has published how to update GSS or where the update code is.. All I've heard is "contact your sales rep". Any pointers? On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme wrote: > you basically run GSS 1.0 code , while in the current version is GSS 2.0 > (which replaced Version 1.5 2 month ago) > > GSS 1.5 and 2.0 have several enhancements in this space so i strongly > encourage you to upgrade your systems. > > if you can specify a bit what your workload is there might also be > additional knobs we can turn to change the behavior. > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM: > >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 09:40 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> Thanks in advance for your help. >> >> We have 6 RG: > >> recovery group vdisks vdisks servers >> ------------------ ----------- ------ ------- >> gss01a 4 8 >> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk >> gss01b 4 8 >> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk >> gss02a 4 8 >> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk >> gss02b 4 8 >> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk >> gss03a 4 8 >> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk >> gss03b 4 8 >> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk >> >> Check the attached file for RG details. >> Following mmlsconfig: > >> [root at gss01a ~]# mmlsconfig >> Configuration data for cluster GSS.ebi.ac.uk: >> --------------------------------------------- >> myNodeConfigNumber 1 >> clusterName GSS.ebi.ac.uk >> clusterId 17987981184946329605 >> autoload no >> dmapiFileHandleSize 32 >> minReleaseLevel 3.5.0.11 >> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] >> pagepool 38g >> nsdRAIDBufferPoolSizePct 80 >> maxBufferDescs 2m >> numaMemoryInterleave yes >> prefetchPct 5 >> maxblocksize 16m >> nsdRAIDTracks 128k >> ioHistorySize 64k >> nsdRAIDSmallBufferSize 256k >> nsdMaxWorkerThreads 3k >> nsdMinWorkerThreads 3k >> nsdRAIDSmallThreadRatio 2 >> nsdRAIDThreadsPerQueue 16 >> nsdClientCksumTypeLocal ck64 >> nsdClientCksumTypeRemote ck64 >> nsdRAIDEventLogToConsole all >> nsdRAIDFastWriteFSDataLimit 64k >> nsdRAIDFastWriteFSMetadataLimit 256k >> nsdRAIDReconstructAggressiveness 1 >> nsdRAIDFlusherBuffersLowWatermarkPct 20 >> nsdRAIDFlusherBuffersLimitPct 80 >> nsdRAIDFlusherTracksLowWatermarkPct 20 >> nsdRAIDFlusherTracksLimitPct 80 >> nsdRAIDFlusherFWLogHighWatermarkMB 1000 >> nsdRAIDFlusherFWLogLimitMB 5000 >> nsdRAIDFlusherThreadsLowWatermark 1 >> nsdRAIDFlusherThreadsHighWatermark 512 >> nsdRAIDBlockDeviceMaxSectorsKB 4096 >> nsdRAIDBlockDeviceNrRequests 32 >> nsdRAIDBlockDeviceQueueDepth 16 >> nsdRAIDBlockDeviceScheduler deadline >> nsdRAIDMaxTransientStale2FT 1 >> nsdRAIDMaxTransientStale3FT 1 >> syncWorkerThreads 256 >> tscWorkerPool 64 >> nsdInlineWriteMax 32k >> maxFilesToCache 12k >> maxStatCache 512 >> maxGeneralThreads 1280 >> flushedDataTarget 1024 >> flushedInodeTarget 1024 >> maxFileCleaners 1024 >> maxBufferCleaners 1024 >> logBufferCount 20 >> logWrapAmountPct 2 >> logWrapThreads 128 >> maxAllocRegionsPerNode 32 >> maxBackgroundDeletionThreads 16 >> maxInodeDeallocPrefetch 128 >> maxMBpS 16000 >> maxReceiverThreads 128 >> worker1Threads 1024 >> worker3Threads 32 >> [common] >> cipherList AUTHONLY >> socketMaxListenConnections 1500 >> failureDetectionTime 60 >> [common] >> adminMode central >> >> File systems in cluster GSS.ebi.ac.uk: >> -------------------------------------- >> /dev/gpfs1 > >> For more configuration paramenters i also attached a file with the >> complete output of mmdiag --config. >> >> >> and mmlsfs: >> >> File system attributes for /dev/gpfs1: >> ====================================== >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 32768 Minimum fragment size >> in bytes (system pool) >> 262144 Minimum fragment size >> in bytes (other pools) >> -i 512 Inode size in bytes >> -I 32768 Indirect block size in bytes >> -m 2 Default number of >> metadata replicas >> -M 2 Maximum number of >> metadata replicas >> -r 1 Default number of data >> replicas >> -R 2 Maximum number of data >> replicas >> -j scatter Block allocation type >> -D nfs4 File locking semantics in >> effect >> -k all ACL semantics in effect >> -n 1000 Estimated number of >> nodes that will mount file system >> -B 1048576 Block size (system pool) >> 8388608 Block size (other pools) >> -Q user;group;fileset Quotas enforced >> user;group;fileset Default quotas enabled >> --filesetdf no Fileset df enabled? >> -V 13.23 (3.5.0.7) File system version >> --create-time Tue Mar 18 16:01:24 2014 File system creation time >> -u yes Support for large LUNs? >> -z no Is DMAPI enabled? >> -L 4194304 Logfile size >> -E yes Exact mtime mount option >> -S yes Suppress atime mount option >> -K whenpossible Strict replica allocation >> option >> --fastea yes Fast external attributes >> enabled? >> --inode-limit 134217728 Maximum number of inodes >> -P system;data Disk storage pools in file >> system >> -d >> >> gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; >> -d >> >> gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; >> -d >> >> gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; >> -d >> >> gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; >> -d >> >> gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 >> Disks in file system >> --perfileset-quota no Per-fileset quota enforcement >> -A yes Automatic mount option >> -o none Additional mount options >> -T /gpfs1 Default mount point >> --mount-priority 0 Mount priority >> >> >> Regards, >> Salvatore >> > >> On 14/10/14 17:22, Sven Oehme wrote: >> your GSS code version is very backlevel. >> >> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk >> as well as mmlsconfig and mmlsfs all >> >> thx. Sven >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug-discuss at gpfsug.org >> Date: 10/14/2014 08:23 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> >> On 14/10/14 15:51, Sven Oehme wrote: >> it means there is contention on inserting data into the fast write >> log on the GSS Node, which could be config or workload related >> what GSS code version are you running >> [root at ebi5-251 ~]# mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "3.5.0-11 efix1 (888041)". >> Built on Jul 9 2013 at 18:03:32 >> Running 6 days 2 hours 10 minutes 35 secs >> >> >> >> and how are the nodes connected with each other (Ethernet or IB) ? >> ethernet. they use the same bonding (4x10Gb/s) where the data is >> passing. We don't have admin dedicated network >> >> [root at gss03a ~]# mmlscluster >> >> GPFS cluster information >> ======================== >> GPFS cluster name: GSS.ebi.ac.uk >> GPFS cluster id: 17987981184946329605 >> GPFS UID domain: GSS.ebi.ac.uk >> Remote shell command: /usr/bin/ssh >> Remote file copy command: /usr/bin/scp >> >> GPFS cluster configuration servers: >> ----------------------------------- >> Primary server: gss01a.ebi.ac.uk >> Secondary server: gss02b.ebi.ac.uk >> >> Node Daemon node name IP address Admin node name Designation >> ----------------------------------------------------------------------- >> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >> >> >> Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different >> subnet because of datacenter constraints ( They are not physically >> in the same row, and due to network constraints was not possible to >> put them in the same subnet). The packets are routed, but should not >> be a problem as there is 160Gb/s bandwidth between them. >> >> Regards, >> Salvatore >> >> >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 07:40 AM >> Subject: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> hello all, >> could someone explain me the meaning of those waiters? >> >> gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> >> Does it means that the vdisk logs are struggling? >> >> Regards, >> Salvatore >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/ >> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com From oehmes at us.ibm.com Tue Oct 14 18:38:10 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 10:38:10 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> <543D51B6.3070602@ebi.ac.uk> Message-ID: i personally don't know, i am in GPFS Research, not in support :-) but have you tried to contact your sales rep ? if you are not successful with that, shoot me a direct email with details about your company name, country and customer number and i try to get you somebody to help. thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Zachary Giles To: gpfsug main discussion list Date: 10/14/2014 10:33 AM Subject: Re: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org Except that AFAIK no one has published how to update GSS or where the update code is.. All I've heard is "contact your sales rep". Any pointers? On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme wrote: > you basically run GSS 1.0 code , while in the current version is GSS 2.0 > (which replaced Version 1.5 2 month ago) > > GSS 1.5 and 2.0 have several enhancements in this space so i strongly > encourage you to upgrade your systems. > > if you can specify a bit what your workload is there might also be > additional knobs we can turn to change the behavior. > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM: > >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 09:40 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> Thanks in advance for your help. >> >> We have 6 RG: > >> recovery group vdisks vdisks servers >> ------------------ ----------- ------ ------- >> gss01a 4 8 >> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk >> gss01b 4 8 >> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk >> gss02a 4 8 >> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk >> gss02b 4 8 >> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk >> gss03a 4 8 >> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk >> gss03b 4 8 >> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk >> >> Check the attached file for RG details. >> Following mmlsconfig: > >> [root at gss01a ~]# mmlsconfig >> Configuration data for cluster GSS.ebi.ac.uk: >> --------------------------------------------- >> myNodeConfigNumber 1 >> clusterName GSS.ebi.ac.uk >> clusterId 17987981184946329605 >> autoload no >> dmapiFileHandleSize 32 >> minReleaseLevel 3.5.0.11 >> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] >> pagepool 38g >> nsdRAIDBufferPoolSizePct 80 >> maxBufferDescs 2m >> numaMemoryInterleave yes >> prefetchPct 5 >> maxblocksize 16m >> nsdRAIDTracks 128k >> ioHistorySize 64k >> nsdRAIDSmallBufferSize 256k >> nsdMaxWorkerThreads 3k >> nsdMinWorkerThreads 3k >> nsdRAIDSmallThreadRatio 2 >> nsdRAIDThreadsPerQueue 16 >> nsdClientCksumTypeLocal ck64 >> nsdClientCksumTypeRemote ck64 >> nsdRAIDEventLogToConsole all >> nsdRAIDFastWriteFSDataLimit 64k >> nsdRAIDFastWriteFSMetadataLimit 256k >> nsdRAIDReconstructAggressiveness 1 >> nsdRAIDFlusherBuffersLowWatermarkPct 20 >> nsdRAIDFlusherBuffersLimitPct 80 >> nsdRAIDFlusherTracksLowWatermarkPct 20 >> nsdRAIDFlusherTracksLimitPct 80 >> nsdRAIDFlusherFWLogHighWatermarkMB 1000 >> nsdRAIDFlusherFWLogLimitMB 5000 >> nsdRAIDFlusherThreadsLowWatermark 1 >> nsdRAIDFlusherThreadsHighWatermark 512 >> nsdRAIDBlockDeviceMaxSectorsKB 4096 >> nsdRAIDBlockDeviceNrRequests 32 >> nsdRAIDBlockDeviceQueueDepth 16 >> nsdRAIDBlockDeviceScheduler deadline >> nsdRAIDMaxTransientStale2FT 1 >> nsdRAIDMaxTransientStale3FT 1 >> syncWorkerThreads 256 >> tscWorkerPool 64 >> nsdInlineWriteMax 32k >> maxFilesToCache 12k >> maxStatCache 512 >> maxGeneralThreads 1280 >> flushedDataTarget 1024 >> flushedInodeTarget 1024 >> maxFileCleaners 1024 >> maxBufferCleaners 1024 >> logBufferCount 20 >> logWrapAmountPct 2 >> logWrapThreads 128 >> maxAllocRegionsPerNode 32 >> maxBackgroundDeletionThreads 16 >> maxInodeDeallocPrefetch 128 >> maxMBpS 16000 >> maxReceiverThreads 128 >> worker1Threads 1024 >> worker3Threads 32 >> [common] >> cipherList AUTHONLY >> socketMaxListenConnections 1500 >> failureDetectionTime 60 >> [common] >> adminMode central >> >> File systems in cluster GSS.ebi.ac.uk: >> -------------------------------------- >> /dev/gpfs1 > >> For more configuration paramenters i also attached a file with the >> complete output of mmdiag --config. >> >> >> and mmlsfs: >> >> File system attributes for /dev/gpfs1: >> ====================================== >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 32768 Minimum fragment size >> in bytes (system pool) >> 262144 Minimum fragment size >> in bytes (other pools) >> -i 512 Inode size in bytes >> -I 32768 Indirect block size in bytes >> -m 2 Default number of >> metadata replicas >> -M 2 Maximum number of >> metadata replicas >> -r 1 Default number of data >> replicas >> -R 2 Maximum number of data >> replicas >> -j scatter Block allocation type >> -D nfs4 File locking semantics in >> effect >> -k all ACL semantics in effect >> -n 1000 Estimated number of >> nodes that will mount file system >> -B 1048576 Block size (system pool) >> 8388608 Block size (other pools) >> -Q user;group;fileset Quotas enforced >> user;group;fileset Default quotas enabled >> --filesetdf no Fileset df enabled? >> -V 13.23 (3.5.0.7) File system version >> --create-time Tue Mar 18 16:01:24 2014 File system creation time >> -u yes Support for large LUNs? >> -z no Is DMAPI enabled? >> -L 4194304 Logfile size >> -E yes Exact mtime mount option >> -S yes Suppress atime mount option >> -K whenpossible Strict replica allocation >> option >> --fastea yes Fast external attributes >> enabled? >> --inode-limit 134217728 Maximum number of inodes >> -P system;data Disk storage pools in file >> system >> -d >> >> gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; >> -d >> >> gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; >> -d >> >> gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; >> -d >> >> gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; >> -d >> >> gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 >> Disks in file system >> --perfileset-quota no Per-fileset quota enforcement >> -A yes Automatic mount option >> -o none Additional mount options >> -T /gpfs1 Default mount point >> --mount-priority 0 Mount priority >> >> >> Regards, >> Salvatore >> > >> On 14/10/14 17:22, Sven Oehme wrote: >> your GSS code version is very backlevel. >> >> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk >> as well as mmlsconfig and mmlsfs all >> >> thx. Sven >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug-discuss at gpfsug.org >> Date: 10/14/2014 08:23 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> >> On 14/10/14 15:51, Sven Oehme wrote: >> it means there is contention on inserting data into the fast write >> log on the GSS Node, which could be config or workload related >> what GSS code version are you running >> [root at ebi5-251 ~]# mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "3.5.0-11 efix1 (888041)". >> Built on Jul 9 2013 at 18:03:32 >> Running 6 days 2 hours 10 minutes 35 secs >> >> >> >> and how are the nodes connected with each other (Ethernet or IB) ? >> ethernet. they use the same bonding (4x10Gb/s) where the data is >> passing. We don't have admin dedicated network >> >> [root at gss03a ~]# mmlscluster >> >> GPFS cluster information >> ======================== >> GPFS cluster name: GSS.ebi.ac.uk >> GPFS cluster id: 17987981184946329605 >> GPFS UID domain: GSS.ebi.ac.uk >> Remote shell command: /usr/bin/ssh >> Remote file copy command: /usr/bin/scp >> >> GPFS cluster configuration servers: >> ----------------------------------- >> Primary server: gss01a.ebi.ac.uk >> Secondary server: gss02b.ebi.ac.uk >> >> Node Daemon node name IP address Admin node name Designation >> ----------------------------------------------------------------------- >> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >> >> >> Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different >> subnet because of datacenter constraints ( They are not physically >> in the same row, and due to network constraints was not possible to >> put them in the same subnet). The packets are routed, but should not >> be a problem as there is 160Gb/s bandwidth between them. >> >> Regards, >> Salvatore >> >> >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 07:40 AM >> Subject: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> hello all, >> could someone explain me the meaning of those waiters? >> >> gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> >> Does it means that the vdisk logs are struggling? >> >> Regards, >> Salvatore >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/ >> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmcneil at kingston.ac.uk Wed Oct 15 14:01:49 2014 From: tmcneil at kingston.ac.uk (Mcneil, Tony) Date: Wed, 15 Oct 2014 14:01:49 +0100 Subject: [gpfsug-discuss] Hello Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705@KUMBX.kuds.kingston.ac.uk> Hello All, Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. So far we have migrated all our students and approximately 60% of our staff. Looking forward to receiving some interesting posts from the forum. Regards Tony. Tony McNeil Senior Systems Support Analyst, Infrastructure, Information Services ______________________________________________________________________________ T Internal: 62852 T 020 8417 2852 Kingston University London Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. Please consider the environment before printing this email. This email has been scanned for all viruses by the MessageLabs Email Security System. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bill.Pappas at STJUDE.ORG Thu Oct 16 14:49:57 2014 From: Bill.Pappas at STJUDE.ORG (Pappas, Bill) Date: Thu, 16 Oct 2014 08:49:57 -0500 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> Are you using ctdb? Thanks, Bill Pappas - Manager - Enterprise Storage Group Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 bill.pappas at stjude.org (901) 595-4549 office www.stjude.org Email disclaimer: http://www.stjude.org/emaildisclaimer -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org Sent: Thursday, October 16, 2014 6:00 AM To: gpfsug-discuss at gpfsug.org Subject: gpfsug-discuss Digest, Vol 33, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Hello (Mcneil, Tony) ---------------------------------------------------------------------- Message: 1 Date: Wed, 15 Oct 2014 14:01:49 +0100 From: "Mcneil, Tony" To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Hello Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> Content-Type: text/plain; charset="us-ascii" Hello All, Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. So far we have migrated all our students and approximately 60% of our staff. Looking forward to receiving some interesting posts from the forum. Regards Tony. Tony McNeil Senior Systems Support Analyst, Infrastructure, Information Services ______________________________________________________________________________ T Internal: 62852 T 020 8417 2852 Kingston University London Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. Please consider the environment before printing this email. This email has been scanned for all viruses by the MessageLabs Email Security System. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 33, Issue 19 ********************************************** From tmcneil at kingston.ac.uk Fri Oct 17 06:25:00 2014 From: tmcneil at kingston.ac.uk (Mcneil, Tony) Date: Fri, 17 Oct 2014 06:25:00 +0100 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) In-Reply-To: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> Hi Bill, Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel Regards Tony. -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill Sent: 16 October 2014 14:50 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Hello (Mcneil, Tony) Are you using ctdb? Thanks, Bill Pappas - Manager - Enterprise Storage Group Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 bill.pappas at stjude.org (901) 595-4549 office www.stjude.org Email disclaimer: http://www.stjude.org/emaildisclaimer -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org Sent: Thursday, October 16, 2014 6:00 AM To: gpfsug-discuss at gpfsug.org Subject: gpfsug-discuss Digest, Vol 33, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Hello (Mcneil, Tony) ---------------------------------------------------------------------- Message: 1 Date: Wed, 15 Oct 2014 14:01:49 +0100 From: "Mcneil, Tony" To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Hello Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> Content-Type: text/plain; charset="us-ascii" Hello All, Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. So far we have migrated all our students and approximately 60% of our staff. Looking forward to receiving some interesting posts from the forum. Regards Tony. Tony McNeil Senior Systems Support Analyst, Infrastructure, Information Services ______________________________________________________________________________ T Internal: 62852 T 020 8417 2852 Kingston University London Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. Please consider the environment before printing this email. This email has been scanned for all viruses by the MessageLabs Email Security System. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 33, Issue 19 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This email has been scanned for all viruses by the MessageLabs Email Security System. This email has been scanned for all viruses by the MessageLabs Email Security System. From chair at gpfsug.org Tue Oct 21 11:42:10 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Tue, 21 Oct 2014 11:42:10 +0100 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) In-Reply-To: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> Message-ID: <54463882.7070009@gpfsug.org> I noticed that v7000 Unified is using CTDB v3.3. What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. Is that a question for Amitay? On 17/10/14 06:25, Mcneil, Tony wrote: > Hi Bill, > > Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel > > Regards > Tony. > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill > Sent: 16 October 2014 14:50 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] Hello (Mcneil, Tony) > > Are you using ctdb? > > Thanks, > Bill Pappas - > Manager - Enterprise Storage Group > Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital > 262 Danny Thomas Place, Mail Stop 504 > Memphis, TN 38105 > bill.pappas at stjude.org > (901) 595-4549 office > www.stjude.org > Email disclaimer: http://www.stjude.org/emaildisclaimer > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org > Sent: Thursday, October 16, 2014 6:00 AM > To: gpfsug-discuss at gpfsug.org > Subject: gpfsug-discuss Digest, Vol 33, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Hello (Mcneil, Tony) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 15 Oct 2014 14:01:49 +0100 > From: "Mcneil, Tony" > To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Hello > Message-ID: > <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> > > Content-Type: text/plain; charset="us-ascii" > > Hello All, > > Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' > > We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. > > The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. > > So far we have migrated all our students and approximately 60% of our staff. > > Looking forward to receiving some interesting posts from the forum. > > Regards > Tony. > > Tony McNeil > Senior Systems Support Analyst, Infrastructure, Information Services > ______________________________________________________________________________ > > T Internal: 62852 > T 020 8417 2852 > > Kingston University London > Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk > > Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. > Please consider the environment before printing this email. > > > This email has been scanned for all viruses by the MessageLabs Email Security System. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 33, Issue 19 > ********************************************** > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From rtriendl at ddn.com Tue Oct 21 11:53:37 2014 From: rtriendl at ddn.com (Robert Triendl) Date: Tue, 21 Oct 2014 10:53:37 +0000 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) In-Reply-To: <54463882.7070009@gpfsug.org> References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> <54463882.7070009@gpfsug.org> Message-ID: Yes, I think so? I am;-) On 2014/10/21, at 19:42, Jez Tucker (Chair) wrote: > I noticed that v7000 Unified is using CTDB v3.3. > What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. > Is that a question for Amitay? > > On 17/10/14 06:25, Mcneil, Tony wrote: >> Hi Bill, >> >> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel >> >> Regards >> Tony. >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill >> Sent: 16 October 2014 14:50 >> To: gpfsug-discuss at gpfsug.org >> Subject: [gpfsug-discuss] Hello (Mcneil, Tony) >> >> Are you using ctdb? >> >> Thanks, >> Bill Pappas - >> Manager - Enterprise Storage Group >> Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital >> 262 Danny Thomas Place, Mail Stop 504 >> Memphis, TN 38105 >> bill.pappas at stjude.org >> (901) 595-4549 office >> www.stjude.org >> Email disclaimer: http://www.stjude.org/emaildisclaimer >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org >> Sent: Thursday, October 16, 2014 6:00 AM >> To: gpfsug-discuss at gpfsug.org >> Subject: gpfsug-discuss Digest, Vol 33, Issue 19 >> >> Send gpfsug-discuss mailing list submissions to >> gpfsug-discuss at gpfsug.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> or, via email, send a message with subject or body 'help' to >> gpfsug-discuss-request at gpfsug.org >> >> You can reach the person managing the list at >> gpfsug-discuss-owner at gpfsug.org >> >> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." >> >> >> Today's Topics: >> >> 1. Hello (Mcneil, Tony) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Wed, 15 Oct 2014 14:01:49 +0100 >> From: "Mcneil, Tony" >> To: "gpfsug-discuss at gpfsug.org" >> Subject: [gpfsug-discuss] Hello >> Message-ID: >> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> >> >> Content-Type: text/plain; charset="us-ascii" >> >> Hello All, >> >> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' >> >> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. >> >> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. >> >> So far we have migrated all our students and approximately 60% of our staff. >> >> Looking forward to receiving some interesting posts from the forum. >> >> Regards >> Tony. >> >> Tony McNeil >> Senior Systems Support Analyst, Infrastructure, Information Services >> ______________________________________________________________________________ >> >> T Internal: 62852 >> T 020 8417 2852 >> >> Kingston University London >> Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk >> >> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. >> Please consider the environment before printing this email. >> >> >> This email has been scanned for all viruses by the MessageLabs Email Security System. >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> End of gpfsug-discuss Digest, Vol 33, Issue 19 >> ********************************************** >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Bill.Pappas at STJUDE.ORG Tue Oct 21 16:59:08 2014 From: Bill.Pappas at STJUDE.ORG (Pappas, Bill) Date: Tue, 21 Oct 2014 10:59:08 -0500 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) (Jez Tucker (Chair)) Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE73664E3E8D@10.stjude.org> >>Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb. 1. What procedure did you follow to configure ctdb/samba to work? Was it hard? Could you show us, if permitted? 2. Are you also controlling NFS via ctdb? 3. Are you managing multiple IP devices? Eg: ethX0 for VLAN104 and ethX1 for VLAN103 (<- for fast 10GbE users). We use SoNAS and v7000 for most NAS and they use ctdb. Their ctdb results are overall 'ok', with a few bumps here or there. Not too many ctdb PMRs over the 3-4 years on SoNAS. We want to set up ctdb for a GPFS AFM cache that services GPSF data clients. That cache writes to an AFM home (SoNAS). This cache also uses Samba and NFS for lightweight (as in IO, though still important) file access on this cache. It does not use ctdb, but I know it should. I would love to learn how you set your environment up even if it may be a little (or a lot) different. Thanks, Bill Pappas - Manager - Enterprise Storage Group Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 bill.pappas at stjude.org (901) 595-4549 office www.stjude.org Email disclaimer: http://www.stjude.org/emaildisclaimer -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org Sent: Tuesday, October 21, 2014 6:00 AM To: gpfsug-discuss at gpfsug.org Subject: gpfsug-discuss Digest, Vol 33, Issue 21 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Hello (Mcneil, Tony) (Jez Tucker (Chair)) 2. Re: Hello (Mcneil, Tony) (Robert Triendl) ---------------------------------------------------------------------- Message: 1 Date: Tue, 21 Oct 2014 11:42:10 +0100 From: "Jez Tucker (Chair)" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony) Message-ID: <54463882.7070009 at gpfsug.org> Content-Type: text/plain; charset=windows-1252; format=flowed I noticed that v7000 Unified is using CTDB v3.3. What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. Is that a question for Amitay? On 17/10/14 06:25, Mcneil, Tony wrote: > Hi Bill, > > Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel > > Regards > Tony. > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill > Sent: 16 October 2014 14:50 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] Hello (Mcneil, Tony) > > Are you using ctdb? > > Thanks, > Bill Pappas - > Manager - Enterprise Storage Group > Sr. Enterprise Network Storage Architect Information Sciences > Department / Enterprise Informatics Division St. Jude Children's > Research Hospital > 262 Danny Thomas Place, Mail Stop 504 > Memphis, TN 38105 > bill.pappas at stjude.org > (901) 595-4549 office > www.stjude.org > Email disclaimer: http://www.stjude.org/emaildisclaimer > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of > gpfsug-discuss-request at gpfsug.org > Sent: Thursday, October 16, 2014 6:00 AM > To: gpfsug-discuss at gpfsug.org > Subject: gpfsug-discuss Digest, Vol 33, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Hello (Mcneil, Tony) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 15 Oct 2014 14:01:49 +0100 > From: "Mcneil, Tony" > To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Hello > Message-ID: > > <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.u > k> > > Content-Type: text/plain; charset="us-ascii" > > Hello All, > > Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' > > We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. > > The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. > > So far we have migrated all our students and approximately 60% of our staff. > > Looking forward to receiving some interesting posts from the forum. > > Regards > Tony. > > Tony McNeil > Senior Systems Support Analyst, Infrastructure, Information Services > ______________________________________________________________________ > ________ > > T Internal: 62852 > T 020 8417 2852 > > Kingston University London > Penrhyn Road, Kingston upon Thames KT1 2EE > www.kingston.ac.uk > > Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. > Please consider the environment before printing this email. > > > This email has been scanned for all viruses by the MessageLabs Email Security System. > -------------- next part -------------- An HTML attachment was > scrubbed... > URL: > bcf/attachment-0001.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 33, Issue 19 > ********************************************** > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ Message: 2 Date: Tue, 21 Oct 2014 10:53:37 +0000 From: Robert Triendl To: "chair at gpfsug.org" , gpfsug main discussion list Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony) Message-ID: Content-Type: text/plain; charset="Windows-1252" Yes, I think so? I am;-) On 2014/10/21, at 19:42, Jez Tucker (Chair) wrote: > I noticed that v7000 Unified is using CTDB v3.3. > What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. > Is that a question for Amitay? > > On 17/10/14 06:25, Mcneil, Tony wrote: >> Hi Bill, >> >> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel >> >> Regards >> Tony. >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org >> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill >> Sent: 16 October 2014 14:50 >> To: gpfsug-discuss at gpfsug.org >> Subject: [gpfsug-discuss] Hello (Mcneil, Tony) >> >> Are you using ctdb? >> >> Thanks, >> Bill Pappas - >> Manager - Enterprise Storage Group >> Sr. Enterprise Network Storage Architect Information Sciences >> Department / Enterprise Informatics Division St. Jude Children's >> Research Hospital >> 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 >> bill.pappas at stjude.org >> (901) 595-4549 office >> www.stjude.org >> Email disclaimer: http://www.stjude.org/emaildisclaimer >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org >> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of >> gpfsug-discuss-request at gpfsug.org >> Sent: Thursday, October 16, 2014 6:00 AM >> To: gpfsug-discuss at gpfsug.org >> Subject: gpfsug-discuss Digest, Vol 33, Issue 19 >> >> Send gpfsug-discuss mailing list submissions to >> gpfsug-discuss at gpfsug.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> or, via email, send a message with subject or body 'help' to >> gpfsug-discuss-request at gpfsug.org >> >> You can reach the person managing the list at >> gpfsug-discuss-owner at gpfsug.org >> >> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." >> >> >> Today's Topics: >> >> 1. Hello (Mcneil, Tony) >> >> >> --------------------------------------------------------------------- >> - >> >> Message: 1 >> Date: Wed, 15 Oct 2014 14:01:49 +0100 >> From: "Mcneil, Tony" >> To: "gpfsug-discuss at gpfsug.org" >> Subject: [gpfsug-discuss] Hello >> Message-ID: >> >> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac. >> uk> >> >> Content-Type: text/plain; charset="us-ascii" >> >> Hello All, >> >> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' >> >> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. >> >> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. >> >> So far we have migrated all our students and approximately 60% of our staff. >> >> Looking forward to receiving some interesting posts from the forum. >> >> Regards >> Tony. >> >> Tony McNeil >> Senior Systems Support Analyst, Infrastructure, Information Services >> _____________________________________________________________________ >> _________ >> >> T Internal: 62852 >> T 020 8417 2852 >> >> Kingston University London >> Penrhyn Road, Kingston upon Thames KT1 2EE >> www.kingston.ac.uk >> >> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. >> Please consider the environment before printing this email. >> >> >> This email has been scanned for all viruses by the MessageLabs Email Security System. >> -------------- next part -------------- An HTML attachment was >> scrubbed... >> URL: >> > 8bcf/attachment-0001.html> >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> End of gpfsug-discuss Digest, Vol 33, Issue 19 >> ********************************************** >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 33, Issue 21 ********************************************** From bbanister at jumptrading.com Thu Oct 23 19:35:45 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Oct 2014 18:35:45 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> I reviewed my RFE request again and notice that it has been marked as ?Private? and I think this is preventing people from voting on this RFE. I have talked to others that would like to vote for this RFE. How can I set the RFE to public so that others may vote on it? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister Sent: Friday, October 10, 2014 12:13 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted. I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable. I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!) Thanks, -Bryan Ps. Hajo said that he couldn?t access the RFE to vote on it: I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo Here is what the RFE website states: Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 A unique URL that you can bookmark and share with others. From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Friday, October 10, 2014 11:52 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > wrote: I agree with Ben, I think. I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Oct 23 19:50:21 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Oct 2014 18:50:21 +0000 Subject: [gpfsug-discuss] GPFS User Group at SC14 Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C68@CHI-EXCHANGEW2.w2k.jumptrading.com> I'm going to be attending the GPFS User Group at SC14 this year. Here is basic agenda that was provided: GPFS/Elastic Storage User Group Monday, November 17, 2014 3:00 PM-5:00 PM: GPFS/Elastic Storage User Group [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] IBM Software Defined Storage strategy update [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Customer presentations [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Future directions such as object storage and OpenStack integration [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Elastic Storage server update [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Elastic Storage roadmap (*NDA required) 5:00 PM: Reception Conference room location provided upon registration. *Attendees must sign a non-disclosure agreement upon arrival or as provided in advance. I think it would be great to review the submitted RFEs and give the user group the chance to vote on them to help promote the RFEs that we care about most. I would also really appreciate any additional details regarding the new GPFS 4.1 deadlock detection facility and any recommended best practices around this new feature. Thanks! -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 76 bytes Desc: image001.gif URL: From chair at gpfsug.org Thu Oct 23 19:52:07 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Thu, 23 Oct 2014 19:52:07 +0100 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <54494E57.90304@gpfsug.org> Hi Bryan Unsure, to be honest. When I added all the GPFS UG RFEs in, I didn't see an option to make the RFE private. There's private fields, but not a 'make this RFE private' checkbox or such. This one may be better directed to the GPFS developer forum / redo the RFE. RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll be feeding info back to the group. Jez On 23/10/14 19:35, Bryan Banister wrote: > > I reviewed my RFE request again and notice that it has been marked as > ?Private? and I think this is preventing people from voting on this > RFE. I have talked to others that would like to vote for this RFE. > > How can I set the RFE to public so that others may vote on it? > > Thanks! > > -Bryan > > *From:*gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Bryan Banister > *Sent:* Friday, October 10, 2014 12:13 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > A DMAPI daemon solution puts a dependency on the DMAPI daemon for the > file system to be mounted. I think it would be better to have > something like what I requested in the RFE that would hopefully not > have this dependency, and would be optional/configurable. I?m sure we > would all prefer something that is supported directly by IBM (hence > the RFE!) > > Thanks, > > -Bryan > > Ps. Hajo said that he couldn?t access the RFE to vote on it: > > I would like to support the RFE but i get: > > "You cannot access this page because you do not have the proper > authority." > > Cheers > > Hajo > > Here is what the RFE website states: > > Bookmarkable > URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 > A unique URL that you can bookmark and share with others. > > *From:*gpfsug-discuss-bounces at gpfsug.org > > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Sven Oehme > *Sent:* Friday, October 10, 2014 11:52 AM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > The only DMAPI agent i am aware of is a prototype that was written by > tridge in 2008 to demonstrate a file based HSM system for GPFS. > > its a working prototype, at least it worked in 2008 :-) > > you can get the source code from git : > > http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary > > just to be clear, there is no Support for this code. we obviously > Support the DMAPI interface , but the code that exposes the API is > nothing we provide Support for. > > thx. Sven > > On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > > wrote: > > I agree with Ben, I think. > > I don?t want to use the ILM policy engine as that puts a direct > workload against the metadata storage and server resources. We need > something out-of-band, out of the file system operational path. > > Is there a simple DMAPI daemon that would log the file system > namespace changes that we could use? > > If so are there any limitations? > > And is it possible to set this up in an HA environment? > > Thanks! > > -Bryan > > *From:*gpfsug-discuss-bounces at gpfsug.org > > [mailto:gpfsug-discuss-bounces at gpfsug.org > ] *On Behalf Of *Ben De Luca > *Sent:* Friday, October 10, 2014 11:10 AM > > > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > querying this through the policy engine is far to late to do any thing > useful with it > > On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: > > Ben, > > to get lists of 'Hot Files' turn File Heat on , some discussion about > it is here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 > > thx. Sven > > On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: > > Id like this to see hot files > > On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > > wrote: > > Hmm... I didn't think to use the DMAPI interface. That could be a > nice option. Has anybody done this already and are there any examples > we could look at? > > Thanks! > -Bryan > > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org > > [mailto:gpfsug-discuss-bounces at gpfsug.org > ] On Behalf Of Phil Pishioneri > Sent: Friday, October 10, 2014 10:04 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS RFE promotion > > On 10/9/14 3:31 PM, Bryan Banister wrote: > > > > Just wanted to pass my GPFS RFE along: > > > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > > 0458 > > > > > > *Description*: > > > > GPFS File System Manager should provide the option to log all file and > > directory operations that occur in a file system, preferably stored in > > a TSD (Time Series Database) that could be quickly queried through an > > API interface and command line tools. ... > > > > The rudimentaries for this already exist via the DMAPI interface in > GPFS (used by the TSM HSM product). A while ago this was posted to the > IBM GPFS DeveloperWorks forum: > > On 1/3/11 10:27 AM, dWForums wrote: > > Author: > > AlokK.Dhir > > > > Message: > > We have a proof of concept which uses DMAPI to listens to and > passively logs filesystem changes with a non blocking listener. This > log can be used to generate backup sets etc. Unfortunately, a bug in > the current DMAPI keeps this approach from working in the case of > certain events. I am told 3.4.0.3 may contain a fix. We will gladly > share the code once it is working. > > -Phil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ------------------------------------------------------------------------ > > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ------------------------------------------------------------------------ > > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > > ------------------------------------------------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Oct 23 19:59:52 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Oct 2014 18:59:52 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <54494E57.90304@gpfsug.org> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> <54494E57.90304@gpfsug.org> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C98@CHI-EXCHANGEW2.w2k.jumptrading.com> Looks like IBM decides if the RFE is public or private: Q: What are private requests? A: Private requests are requests that can be viewed only by IBM, the request author, members of a group with the request in its watchlist, and users with the request in their watchlist. Only the author of the request can add a private request to their watchlist or a group watchlist. Private requests appear in various public views, such as Top 20 watched or Planned requests; however, only limited information about the request will be displayed. IBM determines the default request visibility of a request, either public or private, and IBM may change the request visibility at any time. If you are watching a request and have subscribed to email notifications, you will be notified if the visibility of the request changes. I'm submitting a request to make the RFE public so that others may vote on it now, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jez Tucker (Chair) Sent: Thursday, October 23, 2014 1:52 PM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] GPFS RFE promotion Hi Bryan Unsure, to be honest. When I added all the GPFS UG RFEs in, I didn't see an option to make the RFE private. There's private fields, but not a 'make this RFE private' checkbox or such. This one may be better directed to the GPFS developer forum / redo the RFE. RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll be feeding info back to the group. Jez On 23/10/14 19:35, Bryan Banister wrote: I reviewed my RFE request again and notice that it has been marked as "Private" and I think this is preventing people from voting on this RFE. I have talked to others that would like to vote for this RFE. How can I set the RFE to public so that others may vote on it? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister Sent: Friday, October 10, 2014 12:13 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted. I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable. I'm sure we would all prefer something that is supported directly by IBM (hence the RFE!) Thanks, -Bryan Ps. Hajo said that he couldn't access the RFE to vote on it: I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo Here is what the RFE website states: Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 A unique URL that you can bookmark and share with others. From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Friday, October 10, 2014 11:52 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > wrote: I agree with Ben, I think. I don't want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 24 19:58:07 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 24 Oct 2014 18:58:07 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB94C513@CHI-EXCHANGEW2.w2k.jumptrading.com> It is with humble apology and great relief that I was wrong about the AFM limitation that I believed existed in the configuration I explained below. The problem that I had with my configuration is that the NSD client cluster was not completely updated to GPFS 4.1.0-3, as there are a few nodes still running 3.5.0-20 in the cluster which currently prevents upgrading the GPFS file system release version (e.g. mmchconfig release=LATEST) to 4.1.0-3. This GPFS configuration ?requirement? isn?t documented in the Advanced Admin Guide, but it makes sense that this is required since only the GPFS 4.1 release supports the GPFS protocol for AFM fileset targets. I have tested the configuration with a new NSD Client cluster and the configuration works as desired. Thanks Kalyan and others for their feedback. Our file system namespace is unfortunately filled with small files that do not allow AFM to parallelize the data transfers across multiple nodes. And unfortunately AFM will only allow one Gateway node per fileset to perform the prefetch namespace scan operation, which is incredibly slow as I stated before. We were only seeing roughly 100 x " Queue numExec" operations per second. I think this performance is gated by the directory namespace scan of the single gateway node. Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 10:21 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations some clarifications inline: Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister > To: gpfsug main discussion list > Date: 10/07/2014 08:12 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org Interesting that AFM is supposed to work in a multi-cluster environment. We were using GPFS on the backend. The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount. The "gateway" nodes in the new cluster mounted the old file system. All systems were connected over the same QDR IB fabric. The client compute nodes in the third cluster mounted both the old and new file systems. I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed. I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable. There were no error messages in any GPFS or system logs related to the `ls` fail. NFS clients accessing cNFS servers in the new cluster also worked as expected. The `ls` from the NFS client in an AFM fileset returned the expected directory listing. Thus all symptoms indicated the configuration wasn't supported. I may try to replicate the problem in a test environment at some point. However AFM isn't really a great solution for file data migration between file systems for these reasons: 1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data). No way to have it simply keep the two namespaces in sync. And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system. This is concerning and it is not easy to detect that this event has occurred. --> The LU mode is meant for scenarios where changes in cache are not --> meant to be pushed back to old filesystem. If thats not whats desired then other AFM modes like IW can be used to keep namespace in sync and data can flow from both sides. Typically, for data migration --metadata-only to pull in the full namespace first and data can be migrated on demand or via policy as outlined above using prefetch cmd. AFM setup should be extension to GPFS multi-cluster setup when using GPFS backend. 2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME! And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES! --> I am not sure I follow the first downtime. If applications have to start using the new filesystem, then they have to be informed accordingly. If this can be done without bringing down applications, then there is no DOWNTIME. Regarding, second downtime, you are right, disabling AFM after data migration requires unlink and hence downtime. But there is a easy workaround, where revalidation intervals can be increased to max or GW nodes can be unconfigured without downtime with same effect. And disabling AFM can be done at a later point during maintenance window. We plan to modify this to have this done online aka without requiring unlink of the fileset. This will get prioritized if there is enough interest in AFM being used in this direction. 3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration. Multiple threads from a single node just doesn't cut it due to single node bandwidth limits. When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second. The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list! --> Prefetch can run on multiple nodes by configuring multiple GW nodes --> and enabling parallel i/o as specified in the docs..link provided below. Infact it can parallelize data xfer to a single file and also do multiple files in parallel depending on filesizes and various tuning params. 4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration. Yes, there is of course efforts to breakup the namespace for each rsync operations. This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired. --> AFM can be used for data migration without any downtime dictated by --> AFM (see above) and it can infact use multiple threads on multiple nodes to do parallel i/o. AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN. Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 12:03 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister > To: gpfsug main discussion list > Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister > wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Wed Oct 29 13:59:40 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Wed, 29 Oct 2014 13:59:40 +0000 Subject: [gpfsug-discuss] Storagebeers, Nov 13th Message-ID: <5450F2CC.3070302@gpfsug.org> Hello all, I just thought I'd make you all aware of a social, #storagebeers on Nov 13th organised by Martin Glassborow, one of our UG members. http://www.gpfsug.org/2014/10/29/storagebeers-13th-nov/ I'll be popping along. Hopefully see you there. Jez From Jared.Baker at uwyo.edu Wed Oct 29 15:31:31 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 15:31:31 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings Message-ID: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 16:33:22 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 16:33:22 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <1414600402.24518.216.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-10-29 at 15:31 +0000, Jared David Baker wrote: [SNIP] > I?m wondering if somebody has seen this type of issue before? Will > recreating my NSDs destroy the filesystem? I?m thinking that all the > data is intact, but there is no crucial data on this file system yet, > so I could recreate the file system, but I would like to learn how to > solve a problem like this. Thanks for all help and information. > At an educated guess and assuming the disks are visible to the OS (try dd'ing the first few GB to /dev/null) it looks like you have managed at some point to wipe the NSD descriptors from the disks - ouch. The file system will continue to work after this has been done, but if you start rebooting the NSD servers you will find after the last one has been restarted the file system is unmountable. Simply unmounting the file systems from each NDS server is also probably enough. For good measure unless you have a backup of the NSD descriptors somewhere it is also an unrecoverable condition. Lucky for you if there is nothing on it that matters. My suggestion is re-examine what you did during the firmware upgrade, as that is the most likely culprit. However bear in mind that it could have been days or even weeks ago that it occurred. I would raise a PMR to be sure, but it looks to me like you will be recreating the file system from scratch. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From oehmes at gmail.com Wed Oct 29 16:42:26 2014 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 29 Oct 2014 09:42:26 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Hello, there are multiple reasons why the descriptors can not be found . there was a recent change in firmware behaviors on multiple servers that restore the GPT table from a disk if the disk was used as a OS disk before used as GPFS disks. some infos here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e if thats the case there is a procedure to restore them. it could also be something very trivial , e.g. that your multipath mapping changed and your nsddevice file actually just prints out devices instead of scanning them and create a list on the fly , so GPFS ignores the new path to the disks. in any case , opening a PMR and work with Support is the best thing to do before causing any more damage. if the file-system is still mounted don't unmount it under any circumstances as Support needs to extract NSD descriptor information from it to restore them easily. Sven On Wed, Oct 29, 2014 at 8:31 AM, Jared David Baker wrote: > Hello all, > > > > I?m hoping that somebody can shed some light on a problem that I > experienced yesterday. I?ve been working with GPFS for a couple months as > an admin now, but I?ve come across a problem that I?m unable to see the > answer to. Hopefully the solution is not listed somewhere blatantly on the > web, but I spent a fair amount of time looking last night. Here is the > situation: yesterday, I needed to update some firmware on a Mellanox HCA > FDR14 card and reboot one of our GPFS servers and repeat for the sister > node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, > upon reboot, the server seemed to lose the path mappings to the multipath > devices for the NSDs. Output below: > > > > -- > > [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch > > > > Disk name NSD volume ID Device Node name > Remarks > > > --------------------------------------------------------------------------------------- > > dcs3800u31a_lun0 0A62001B54235577 - > mminsd5.infini (not found) server node > > dcs3800u31a_lun0 0A62001B54235577 - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - > mminsd5.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - > mminsd6.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - > mminsd6.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - > mminsd5.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini > (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - > mminsd6.infini (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - > mminsd5.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - > mminsd6.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - > mminsd6.infini (not found) server node > > > > [root at mmmnsd5 ~]# > > -- > > > > Also, the system was working fantastically before the reboot, but now I?m > unable to mount the GPFS filesystem. The disk names look like they are > there and mapped to the NSD volume ID, but there is no Device. I?ve created > the /var/mmfs/etc/nsddevices script and it has the following output with > user return 0: > > > > -- > > [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices > > mapper/dcs3800u31a_lun0 dmm > > mapper/dcs3800u31a_lun10 dmm > > mapper/dcs3800u31a_lun2 dmm > > mapper/dcs3800u31a_lun4 dmm > > mapper/dcs3800u31a_lun6 dmm > > mapper/dcs3800u31a_lun8 dmm > > mapper/dcs3800u31b_lun1 dmm > > mapper/dcs3800u31b_lun11 dmm > > mapper/dcs3800u31b_lun3 dmm > > mapper/dcs3800u31b_lun5 dmm > > mapper/dcs3800u31b_lun7 dmm > > mapper/dcs3800u31b_lun9 dmm > > [root at mmmnsd5 ~]# > > -- > > > > That output looks correct to me based on the documentation. So I went > digging in the GPFS log file and found this relevant information: > > > > -- > > Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No > such NSD locally found. > > Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No > such NSD locally found. > > Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. > No such NSD locally found. > > Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. > No such NSD locally found. > > -- > > > > Okay, so the NSDs don?t seem to be able to be found, so I attempt to > rediscover the NSD by executing the command mmnsddiscover: > > > > -- > > [root at mmmnsd5 ~]# mmnsddiscover > > mmnsddiscover: Attempting to rediscover the disks. This may take a while > ... > > mmnsddiscover: Finished. > > [root at mmmnsd5 ~]# > > -- > > > > I was hoping that finished, but then upon restarting GPFS, there was no > success. Verifying with mmlsnsd -X -f gscratch > > > > -- > > [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch > > > > Disk name NSD volume ID Device Devtype Node > name Remarks > > > --------------------------------------------------------------------------------------------------- > > dcs3800u31a_lun0 0A62001B54235577 - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun0 0A62001B54235577 - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - - > mminsd6.infini (not found) server node > > > > [root at mmmnsd5 ~]# > > -- > > > > I?m wondering if somebody has seen this type of issue before? Will > recreating my NSDs destroy the filesystem? I?m thinking that all the data > is intact, but there is no crucial data on this file system yet, so I could > recreate the file system, but I would like to learn how to solve a problem > like this. Thanks for all help and information. > > > > Regards, > > > > Jared > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Wed Oct 29 16:46:35 2014 From: oester at gmail.com (Bob Oesterlin) Date: Wed, 29 Oct 2014 11:46:35 -0500 Subject: [gpfsug-discuss] GPFS 4.1 event "deadlockOverload" Message-ID: I posted this to developerworks, but haven't seen a response. This is NOT the same event "deadlockDetected" that is documented in the 4.1 Probelm Determination Guide. I see these errors -in my mmfslog on the cluster master. I just upgraded to 4.1, and I can't find this documented anywhere. What is "event deadlockOverload" ? And what script would it call? The nodes in question are part of a CNFS group. Mon Oct 27 10:11:08.848 2014: [I] Received overload notification request from 10.30.42.30 to forward to all nodes in cluster XXX Mon Oct 27 10:11:08.849 2014: [I] Calling User Exit Script gpfsNotifyOverload: event deadlockOverload, Async command /usr/lpp/mmfs/bin/mmcommon. Mon Oct 27 10:11:14.478 2014: [I] Received overload notification request from 10.30.42.26 to forward to all nodes in cluster XXX Mon Oct 27 10:11:58.869 2014: [I] Received overload notification request from 10.30.42.30 to forward to all nodes in cluster XXX Mon Oct 27 10:11:58.870 2014: [I] Calling User Exit Script gpfsNotifyOverload: event deadlockOverload, Async command /usr/lpp/mmfs/bin/mmcommon. Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 17:19:14 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 17:19:14 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > Hello, > > > there are multiple reasons why the descriptors can not be found . > > > there was a recent change in firmware behaviors on multiple servers > that restore the GPT table from a disk if the disk was used as a OS > disk before used as GPFS disks. some infos > here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > if thats the case there is a procedure to restore them. I have been categorically told by IBM in no uncertain terms if the NSD descriptors have *ALL* been wiped then it is game over for that file system; restore from backup is your only option. If the GPT table has been "restored" and overwritten the NSD descriptors then you are hosed. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From oehmes at gmail.com Wed Oct 29 17:22:30 2014 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 29 Oct 2014 10:22:30 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> Message-ID: if you still have a running system you can extract the information and recreate the descriptors. if your sytem is already down, this is not possible any more. which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help . Sven On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard wrote: > On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > > Hello, > > > > > > there are multiple reasons why the descriptors can not be found . > > > > > > there was a recent change in firmware behaviors on multiple servers > > that restore the GPT table from a disk if the disk was used as a OS > > disk before used as GPFS disks. some infos > > here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > > > > if thats the case there is a procedure to restore them. > > I have been categorically told by IBM in no uncertain terms if the NSD > descriptors have *ALL* been wiped then it is game over for that file > system; restore from backup is your only option. > > If the GPT table has been "restored" and overwritten the NSD descriptors > then you are hosed. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 17:29:09 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 17:29:09 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> Message-ID: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote: > if you still have a running system you can extract the information and > recreate the descriptors. We had a running system with the file system still mounted on some nodes but all the NSD descriptors wiped, and I repeat where categorically told by IBM that nothing could be done and to restore the file system from backup. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Jared.Baker at uwyo.edu Wed Oct 29 17:30:00 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 17:30:00 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> Message-ID: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com> Thanks for all the information. I?m not exactly sure what happened during the firmware update of the HCAs (another admin). But I do have all the stanza files that I used to create the NSDs. Possible to utilize them to just regenerate the NSDs or is it consensus that the FS is gone? As the system was not in production (yet) I?ve got no problem delaying the release and running some tests to verify possible fixes. The system was already unmounted, so it is a completely inactive FS across the cluster. Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 11:23 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings if you still have a running system you can extract the information and recreate the descriptors. if your sytem is already down, this is not possible any more. which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help . Sven On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard > wrote: On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > Hello, > > > there are multiple reasons why the descriptors can not be found . > > > there was a recent change in firmware behaviors on multiple servers > that restore the GPT table from a disk if the disk was used as a OS > disk before used as GPFS disks. some infos > here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > if thats the case there is a procedure to restore them. I have been categorically told by IBM in no uncertain terms if the NSD descriptors have *ALL* been wiped then it is game over for that file system; restore from backup is your only option. If the GPT table has been "restored" and overwritten the NSD descriptors then you are hosed. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Oct 29 17:45:38 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 10:45:38 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Jared, if time permits i would open a PMR to check what happened. as i stated in my first email it could be multiple things, the GPT restore is only one possible of many explanations and some more simple reasons could explain what you see as well. get somebody from support check the state and then we know for sure. it would give you also peace of mind that it doesn't happen again when you are in production. if you feel its not worth and you don't wipe any important information start over again. btw. the newer BIOS versions of IBM servers have a option from preventing the GPT issue from happening : [root at gss02n1 ~]# asu64 showvalues DiskGPTRecovery.DiskGPTRecovery IBM Advanced Settings Utility version 9.61.85B Licensed Materials - Property of IBM (C) Copyright IBM Corp. 2007-2014 All Rights Reserved IMM LAN-over-USB device 0 enabled successfully. Successfully discovered the IMM via SLP. Discovered IMM at IP address 169.254.95.118 Connected to IMM at IP address 169.254.95.118 DiskGPTRecovery.DiskGPTRecovery=None= if you set it the GPT will never get restored. you would have to set this on all the nodes that have access to the disks. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 10:30 AM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks for all the information. I?m not exactly sure what happened during the firmware update of the HCAs (another admin). But I do have all the stanza files that I used to create the NSDs. Possible to utilize them to just regenerate the NSDs or is it consensus that the FS is gone? As the system was not in production (yet) I?ve got no problem delaying the release and running some tests to verify possible fixes. The system was already unmounted, so it is a completely inactive FS across the cluster. Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 11:23 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings if you still have a running system you can extract the information and recreate the descriptors. if your sytem is already down, this is not possible any more. which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help . Sven On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard wrote: On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > Hello, > > > there are multiple reasons why the descriptors can not be found . > > > there was a recent change in firmware behaviors on multiple servers > that restore the GPT table from a disk if the disk was used as a OS > disk before used as GPFS disks. some infos > here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > if thats the case there is a procedure to restore them. I have been categorically told by IBM in no uncertain terms if the NSD descriptors have *ALL* been wiped then it is game over for that file system; restore from backup is your only option. If the GPT table has been "restored" and overwritten the NSD descriptors then you are hosed. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Wed Oct 29 18:57:28 2014 From: ewahl at osc.edu (Ed Wahl) Date: Wed, 29 Oct 2014 18:57:28 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> , <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk> Message-ID: SOBAR is your friend at that point? Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jonathan Buzzard [jonathan at buzzard.me.uk] Sent: Wednesday, October 29, 2014 1:29 PM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Server lost NSD mappings On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote: > if you still have a running system you can extract the information and > recreate the descriptors. We had a running system with the file system still mounted on some nodes but all the NSD descriptors wiped, and I repeat where categorically told by IBM that nothing could be done and to restore the file system from backup. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ewahl at osc.edu Wed Oct 29 19:07:34 2014 From: ewahl at osc.edu (Ed Wahl) Date: Wed, 29 Oct 2014 19:07:34 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I?m hoping that somebody can shed some light on a problem that I experienced yesterday. I?ve been working with GPFS for a couple months as an admin now, but I?ve come across a problem that I?m unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I?m unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I?ve created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don?t seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I?m wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I?m thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared From Jared.Baker at uwyo.edu Wed Oct 29 19:27:26 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 19:27:26 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at us.ibm.com Wed Oct 29 19:41:22 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 12:41:22 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jared.Baker at uwyo.edu Wed Oct 29 19:46:23 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 19:46:23 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Oct 29 20:02:53 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 13:02:53 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jared.Baker at uwyo.edu Wed Oct 29 20:13:06 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 20:13:06 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Apologies Sven, w/o comments below: -- #!/bin/ksh CONTROLLER_REGEX='[ab]_lun[0-9]+' for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX ) do echo mapper/$dev dmm #echo mapper/$dev generic done # Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover), return 0 -- Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Oct 29 20:25:10 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 13:25:10 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Hi, based on what i see is your BIOS or FW update wiped the NSD descriptor by restoring a GPT table on the start of a disk that shouldn't have a GPT table to begin with as its under control of GPFS. future releases of GPFS prevent this by writing our own GPT label to the disks so other tools don't touch them, but that doesn't help in your case any more. if you want this officially confirmed i would still open a PMR, but at that point given that you don't seem to have any production data on it from what i see in your response you should recreate the filesystem. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 01:13 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Apologies Sven, w/o comments below: -- #!/bin/ksh CONTROLLER_REGEX='[ab]_lun[0-9]+' for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX ) do echo mapper/$dev dmm #echo mapper/$dev generic done # Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover), return 0 -- Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jared.Baker at uwyo.edu Wed Oct 29 20:30:29 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 20:30:29 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Thanks Sven, I appreciate the feedback. I'll be opening the PMR soon. Again, thanks for the information. Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, based on what i see is your BIOS or FW update wiped the NSD descriptor by restoring a GPT table on the start of a disk that shouldn't have a GPT table to begin with as its under control of GPFS. future releases of GPFS prevent this by writing our own GPT label to the disks so other tools don't touch them, but that doesn't help in your case any more. if you want this officially confirmed i would still open a PMR, but at that point given that you don't seem to have any production data on it from what i see in your response you should recreate the filesystem. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 01:13 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Apologies Sven, w/o comments below: -- #!/bin/ksh CONTROLLER_REGEX='[ab]_lun[0-9]+' for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX ) do echo mapper/$dev dmm #echo mapper/$dev generic done # Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover), return 0 -- Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 20:32:25 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 20:32:25 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <54514ED9.9030604@buzzard.me.uk> On 29/10/14 20:25, Sven Oehme wrote: > Hi, > > based on what i see is your BIOS or FW update wiped the NSD descriptor > by restoring a GPT table on the start of a disk that shouldn't have a > GPT table to begin with as its under control of GPFS. > future releases of GPFS prevent this by writing our own GPT label to the > disks so other tools don't touch them, but that doesn't help in your > case any more. if you want this officially confirmed i would still open > a PMR, but at that point given that you don't seem to have any > production data on it from what i see in your response you should > recreate the filesystem. > However before recreating the file system I would run the script to see if your disks have the secondary copy of the GPT partition table and if they do make sure it is wiped/removed *BEFORE* you go any further. Otherwise it could happen again... JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Jared.Baker at uwyo.edu Wed Oct 29 20:47:51 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 20:47:51 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <54514ED9.9030604@buzzard.me.uk> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> <54514ED9.9030604@buzzard.me.uk> Message-ID: Jonathan, which script are you talking about? Thanks, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jonathan Buzzard Sent: Wednesday, October 29, 2014 2:32 PM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Server lost NSD mappings On 29/10/14 20:25, Sven Oehme wrote: > Hi, > > based on what i see is your BIOS or FW update wiped the NSD descriptor > by restoring a GPT table on the start of a disk that shouldn't have a > GPT table to begin with as its under control of GPFS. > future releases of GPFS prevent this by writing our own GPT label to the > disks so other tools don't touch them, but that doesn't help in your > case any more. if you want this officially confirmed i would still open > a PMR, but at that point given that you don't seem to have any > production data on it from what i see in your response you should > recreate the filesystem. > However before recreating the file system I would run the script to see if your disks have the secondary copy of the GPT partition table and if they do make sure it is wiped/removed *BEFORE* you go any further. Otherwise it could happen again... JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Wed Oct 29 21:01:06 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 21:01:06 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> <54514ED9.9030604@buzzard.me.uk> Message-ID: <54515592.4050606@buzzard.me.uk> On 29/10/14 20:47, Jared David Baker wrote: > Jonathan, which script are you talking about? > The one here https://www.ibm.com/developerworks/community/forums/html/topic?id=32296bac-bfa1-45ff-9a43-08b0a36b17ef&ps=25 Use for detecting and clearing that secondary GPT table. Never used it of course, my disaster was caused by an idiot admin installing a new OS not mapping the disks out and then hit yes yes yes when asked if he wanted to blank the disks, the RHEL installer duly obliged. Then five days later I rebooted the last NSD server for an upgrade and BOOM 50TB and 80 million files down the swanny. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From mark.bergman at uphs.upenn.edu Fri Oct 31 17:10:55 2014 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Fri, 31 Oct 2014 13:10:55 -0400 Subject: [gpfsug-discuss] mapping to hostname? Message-ID: <25152-1414775455.156309@Pc2q.WYui.XCNm> Many GPFS logs & utilities refer to nodes via their name. I haven't found an "mm*" executable that shows the mapping between that name an the hostname. Is there a simple method to map the designation to the node's hostname? Thanks, Mark From bevans at pixitmedia.com Fri Oct 31 17:32:45 2014 From: bevans at pixitmedia.com (Barry Evans) Date: Fri, 31 Oct 2014 17:32:45 +0000 Subject: [gpfsug-discuss] mapping to hostname? In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm> References: <25152-1414775455.156309@Pc2q.WYui.XCNm> Message-ID: <5453C7BD.8030608@pixitmedia.com> I'm sure there is a better way to do this, but old habits die hard. I tend to use 'mmfsadm saferdump tscomm' - connection details should be littered throughout. Cheers, Barry ArcaStream/Pixit Media mark.bergman at uphs.upenn.edu wrote: > Many GPFS logs& utilities refer to nodes via their name. > > I haven't found an "mm*" executable that shows the mapping between that > name an the hostname. > > Is there a simple method to map the designation to the node's > hostname? > > Thanks, > > Mark > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. From oehmes at us.ibm.com Fri Oct 31 18:20:40 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Fri, 31 Oct 2014 11:20:40 -0700 Subject: [gpfsug-discuss] mapping to hostname? In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm> References: <25152-1414775455.156309@Pc2q.WYui.XCNm> Message-ID: Hi, the official way to do this is mmdiag --network thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: mark.bergman at uphs.upenn.edu To: gpfsug main discussion list Date: 10/31/2014 10:11 AM Subject: [gpfsug-discuss] mapping to hostname? Sent by: gpfsug-discuss-bounces at gpfsug.org Many GPFS logs & utilities refer to nodes via their name. I haven't found an "mm*" executable that shows the mapping between that name an the hostname. Is there a simple method to map the designation to the node's hostname? Thanks, Mark _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.bergman at uphs.upenn.edu Fri Oct 31 18:57:44 2014 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Fri, 31 Oct 2014 14:57:44 -0400 Subject: [gpfsug-discuss] mapping to hostname? In-Reply-To: Your message of "Fri, 31 Oct 2014 11:20:40 -0700." References: <25152-1414775455.156309@Pc2q.WYui.XCNm> Message-ID: <9586-1414781864.388104@tEdB.dMla.tGDi> In the message dated: Fri, 31 Oct 2014 11:20:40 -0700, The pithy ruminations from Sven Oehme on to hostname?> were: => Hi, => => the official way to do this is mmdiag --network OK. I'm now using: mmdiag --network | awk '{if ( $1 ~ / => thx. Sven => => => ------------------------------------------ => Sven Oehme => Scalable Storage Research => email: oehmes at us.ibm.com => Phone: +1 (408) 824-8904 => IBM Almaden Research Lab => ------------------------------------------ => => => => From: mark.bergman at uphs.upenn.edu => To: gpfsug main discussion list => Date: 10/31/2014 10:11 AM => Subject: [gpfsug-discuss] mapping to hostname? => Sent by: gpfsug-discuss-bounces at gpfsug.org => => => => Many GPFS logs & utilities refer to nodes via their name. => => I haven't found an "mm*" executable that shows the mapping between that => name an the hostname. => => Is there a simple method to map the designation to the node's => hostname? => => Thanks, => => Mark => From stuartb at 4gh.net Fri Oct 3 18:19:08 2014 From: stuartb at 4gh.net (Stuart Barkley) Date: Fri, 3 Oct 2014 13:19:08 -0400 (EDT) Subject: [gpfsug-discuss] filesets and mountpoint naming Message-ID: Resent: First copy sent Sept 23. Maybe stuck in a moderation queue? When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From bbanister at jumptrading.com Mon Oct 6 16:17:44 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Mon, 6 Oct 2014 15:17:44 +0000 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com> There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang. Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment. Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley Sent: Friday, October 03, 2014 12:19 PM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] filesets and mountpoint naming Resent: First copy sent Sept 23. Maybe stuck in a moderation queue? When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From bbanister at jumptrading.com Mon Oct 6 16:36:17 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Mon, 6 Oct 2014 15:36:17 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch -j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the "--home-inode-file" from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sandra.McLaughlin at astrazeneca.com Mon Oct 6 16:40:45 2014 From: Sandra.McLaughlin at astrazeneca.com (McLaughlin, Sandra M) Date: Mon, 6 Oct 2014 15:40:45 +0000 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <5ed81d7bfbc94873aa804cfc807d5858@DBXPR04MB031.eurprd04.prod.outlook.com> Hi Stuart, We have a very similar setup. I use /gpfs01, /gpfs02 etc. and then use filesets within those, and symbolic links on the gpfs cluster members to give the same user experience combined with automounter maps (we have a large number of NFS clients as well as cluster members). This all works quite well. Regards, Sandra -------------------------------------------------------------------------- AstraZeneca UK Limited is a company incorporated in England and Wales with registered number: 03674842 and a registered office at 2 Kingdom Street, London, W2 6BD. Confidentiality Notice: This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful. Disclaimer: Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by AstraZeneca UK Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email. Monitoring: AstraZeneca UK Limited may monitor email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking Compliance with our Code of Conduct and Policies. -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley Sent: 23 September 2014 16:47 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] filesets and mountpoint naming When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From zgiles at gmail.com Mon Oct 6 16:42:56 2014 From: zgiles at gmail.com (Zachary Giles) Date: Mon, 6 Oct 2014 11:42:56 -0400 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Here we have just one large GPFS file system with many file sets inside. We mount it under /sc/something (sc for scientific computing). We user the /sc/ as we previously had another GPFS file system while migrating from one to the other. It's pretty easy and straight forward to have just one file system.. eases administration and mounting. You can make symlinks.. like /scratch -> /sc/something/scratch/ if you want. We did that, and it's how most of our users got to the system for a long time. We even remounted the GPFS file system from where DDN left it at install time ( /gs01 ) to /sc/gs01, updated the symlink, and the users never knew. Multicluster for compute nodes separate from the FS cluster. YMMV depending on if you want to allow everyone to mount your file system or not. I know some people don't. We only admin our own boxes and no one else does, so it works best this way for us given the ideal scenario. On Mon, Oct 6, 2014 at 11:17 AM, Bryan Banister wrote: > There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang. > > Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment. Hope that helps, > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley > Sent: Friday, October 03, 2014 12:19 PM > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] filesets and mountpoint naming > > Resent: First copy sent Sept 23. Maybe stuck in a moderation queue? > > When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: > > /home > /scratch > /projects > /reference > /applications > > We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). > > We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. > > We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. > > Some thoughts are to just do filesystems as: > > /gpfs01, /gpfs02, etc. > /mnt/gpfs01, etc > /mnt/clustera/gpfs01, etc. > > What have other people done? Are you happy with it? What would you do differently? > > Thanks, > Stuart > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com From oehmes at gmail.com Mon Oct 6 17:27:58 2014 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 6 Oct 2014 09:27:58 -0700 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: > Just an FYI to the GPFS user community, > > > > We have been testing out GPFS AFM file systems in our required process of > file data migration between two GPFS file systems. The two GPFS file > systems are managed in two separate GPFS clusters. We have a third GPFS > cluster for compute systems. We created new independent AFM filesets in > the new GPFS file system that are linked to directories in the old file > system. Unfortunately access to the AFM filesets from the compute cluster > completely hang. Access to the other parts of the second file system is > fine. This limitation/issue is not documented in the Advanced Admin Guide. > > > > Further, we performed prefetch operations using a file mmafmctl command, > but the process appears to be single threaded and the operation was > extremely slow as a result. According to the Advanced Admin Guide, it is > not possible to run multiple prefetch jobs on the same fileset: > > GPFS can prefetch the data using the *mmafmctl **Device **prefetch ?j **FilesetName > *command (which specifies > > a list of files to prefetch). Note the following about prefetching: > > v It can be run in parallel on multiple filesets (although more than one > prefetching job cannot be run in > > parallel on a single fileset). > > > > We were able to quickly create the ?--home-inode-file? from the old file > system using the mmapplypolicy command as the documentation describes. > However the AFM prefetch operation is so slow that we are better off > running parallel rsync operations between the file systems versus using the > GPFS AFM prefetch operation. > > > > Cheers, > > -Bryan > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Mon Oct 6 17:30:02 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Mon, 6 Oct 2014 16:30:02 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister > wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kgunda at in.ibm.com Tue Oct 7 06:03:07 2014 From: kgunda at in.ibm.com (Kalyan Gunda) Date: Tue, 7 Oct 2014 10:33:07 +0530 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Tue Oct 7 15:44:48 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 7 Oct 2014 14:44:48 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> Interesting that AFM is supposed to work in a multi-cluster environment. We were using GPFS on the backend. The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount. The "gateway" nodes in the new cluster mounted the old file system. All systems were connected over the same QDR IB fabric. The client compute nodes in the third cluster mounted both the old and new file systems. I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed. I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable. There were no error messages in any GPFS or system logs related to the `ls` fail. NFS clients accessing cNFS servers in the new cluster also worked as expected. The `ls` from the NFS client in an AFM fileset returned the expected directory listing. Thus all symptoms indicated the configuration wasn't supported. I may try to replicate the problem in a test environment at some point. However AFM isn't really a great solution for file data migration between file systems for these reasons: 1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data). No way to have it simply keep the two namespaces in sync. And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system. This is concerning and it is not easy to detect that this event has occurred. 2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME! And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES! 3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration. Multiple threads from a single node just doesn't cut it due to single node bandwidth limits. When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second. The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list! 4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration. Yes, there is of course efforts to breakup the namespace for each rsync operations. This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired. AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN. Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 12:03 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From kgunda at in.ibm.com Tue Oct 7 16:20:30 2014 From: kgunda at in.ibm.com (Kalyan Gunda) Date: Tue, 7 Oct 2014 20:50:30 +0530 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: some clarifications inline: Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/07/2014 08:12 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org Interesting that AFM is supposed to work in a multi-cluster environment. We were using GPFS on the backend. The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount. The "gateway" nodes in the new cluster mounted the old file system. All systems were connected over the same QDR IB fabric. The client compute nodes in the third cluster mounted both the old and new file systems. I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed. I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable. There were no error messages in any GPFS or system logs related to the `ls` fail. NFS clients accessing cNFS servers in the new cluster also worked as expected. The `ls` from the NFS client in an AFM fileset returned the expected directory listing. Thus all symptoms indicated the configuration wasn't supported. I may try to replicate the problem in a test environment at some point. However AFM isn't really a great solution for file data migration between file systems for these reasons: 1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data). No way to have it simply keep the two namespaces in sync. And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system. This is concerning and it is not easy to detect that this event has occurred. --> The LU mode is meant for scenarios where changes in cache are not meant to be pushed back to old filesystem. If thats not whats desired then other AFM modes like IW can be used to keep namespace in sync and data can flow from both sides. Typically, for data migration --metadata-only to pull in the full namespace first and data can be migrated on demand or via policy as outlined above using prefetch cmd. AFM setup should be extension to GPFS multi-cluster setup when using GPFS backend. 2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME! And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES! --> I am not sure I follow the first downtime. If applications have to start using the new filesystem, then they have to be informed accordingly. If this can be done without bringing down applications, then there is no DOWNTIME. Regarding, second downtime, you are right, disabling AFM after data migration requires unlink and hence downtime. But there is a easy workaround, where revalidation intervals can be increased to max or GW nodes can be unconfigured without downtime with same effect. And disabling AFM can be done at a later point during maintenance window. We plan to modify this to have this done online aka without requiring unlink of the fileset. This will get prioritized if there is enough interest in AFM being used in this direction. 3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration. Multiple threads from a single node just doesn't cut it due to single node bandwidth limits. When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second. The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list! --> Prefetch can run on multiple nodes by configuring multiple GW nodes and enabling parallel i/o as specified in the docs..link provided below. Infact it can parallelize data xfer to a single file and also do multiple files in parallel depending on filesizes and various tuning params. 4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration. Yes, there is of course efforts to breakup the namespace for each rsync operations. This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired. --> AFM can be used for data migration without any downtime dictated by AFM (see above) and it can infact use multiple threads on multiple nodes to do parallel i/o. AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN. Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 12:03 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sdinardo at ebi.ac.uk Thu Oct 9 13:02:44 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Thu, 09 Oct 2014 13:02:44 +0100 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? Message-ID: <54367964.1050900@ebi.ac.uk> Hello everyone, Suppose we want to build a new GPFS storage using SAN attached storages, but instead to put metadata in a shared storage, we want to use FusionIO PCI cards locally on the servers to speed up metadata operation( http://www.fusionio.com/products/iodrive) and for reliability, replicate the metadata in all the servers, will this work in case of server failure? To make it more clear: If a server fail i will loose also a metadata vdisk. Its the replica mechanism its reliable enough to avoid metadata corruption and loss of data? Thanks in advance Salvatore Di Nardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Oct 9 20:31:28 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 9 Oct 2014 19:31:28 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> Just wanted to pass my GPFS RFE along: http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 Description: GPFS File System Manager should provide the option to log all file and directory operations that occur in a file system, preferably stored in a TSD (Time Series Database) that could be quickly queried through an API interface and command line tools. This would allow many required file system management operations to obtain the change log of a file system namespace without having to use the GPFS ILM policy engine to search all file system metadata for changes, and would not need to run massive differential comparisons of file system namespace snapshots to determine what files have been modified, deleted, added, etc. It would be doubly great if this could be controlled on a per-fileset bases. Use case: This could be used for a very large number of file system management applications, including: 1) SOBAR (Scale-Out Backup And Restore) 2) Data Security Auditing and Monitoring applications 3) Async Replication of namespace between GPFS file systems without the requirement of AFM, which must use ILM policies that add unnecessary workload to metadata resources. 4) Application file system access profiling Please vote for it if you feel it would also benefit your operation, thanks, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From service at metamodul.com Fri Oct 10 13:21:43 2014 From: service at metamodul.com (service at metamodul.com) Date: Fri, 10 Oct 2014 14:21:43 +0200 (CEST) Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <937639307.291563.1412943703119.JavaMail.open-xchange@oxbaltgw12.schlund.de> > Bryan Banister hat am 9. Oktober 2014 um 21:31 > geschrieben: > > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 > > I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgp at psu.edu Fri Oct 10 16:04:02 2014 From: pgp at psu.edu (Phil Pishioneri) Date: Fri, 10 Oct 2014 11:04:02 -0400 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <5437F562.1080609@psu.edu> On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil From bbanister at jumptrading.com Fri Oct 10 16:08:04 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Oct 2014 15:08:04 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <5437F562.1080609@psu.edu> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From bdeluca at gmail.com Fri Oct 10 16:26:40 2014 From: bdeluca at gmail.com (Ben De Luca) Date: Fri, 10 Oct 2014 23:26:40 +0800 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister wrote: > Hmm... I didn't think to use the DMAPI interface. That could be a nice > option. Has anybody done this already and are there any examples we could > look at? > > Thanks! > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto: > gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri > Sent: Friday, October 10, 2014 10:04 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS RFE promotion > > On 10/9/14 3:31 PM, Bryan Banister wrote: > > > > Just wanted to pass my GPFS RFE along: > > > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > > 0458 > > > > > > *Description*: > > > > GPFS File System Manager should provide the option to log all file and > > directory operations that occur in a file system, preferably stored in > > a TSD (Time Series Database) that could be quickly queried through an > > API interface and command line tools. ... > > > > The rudimentaries for this already exist via the DMAPI interface in GPFS > (used by the TSM HSM product). A while ago this was posted to the IBM GPFS > DeveloperWorks forum: > > On 1/3/11 10:27 AM, dWForums wrote: > > Author: > > AlokK.Dhir > > > > Message: > > We have a proof of concept which uses DMAPI to listens to and passively > logs filesystem changes with a non blocking listener. This log can be used > to generate backup sets etc. Unfortunately, a bug in the current DMAPI > keeps this approach from working in the case of certain events. I am told > 3.4.0.3 may contain a fix. We will gladly share the code once it is > working. > > -Phil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Fri Oct 10 16:51:51 2014 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 10 Oct 2014 08:51:51 -0700 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca wrote: > Id like this to see hot files > > On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister < > bbanister at jumptrading.com> wrote: > >> Hmm... I didn't think to use the DMAPI interface. That could be a nice >> option. Has anybody done this already and are there any examples we could >> look at? >> >> Thanks! >> -Bryan >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org [mailto: >> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri >> Sent: Friday, October 10, 2014 10:04 AM >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] GPFS RFE promotion >> >> On 10/9/14 3:31 PM, Bryan Banister wrote: >> > >> > Just wanted to pass my GPFS RFE along: >> > >> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 >> > 0458 >> > >> > >> > *Description*: >> > >> > GPFS File System Manager should provide the option to log all file and >> > directory operations that occur in a file system, preferably stored in >> > a TSD (Time Series Database) that could be quickly queried through an >> > API interface and command line tools. ... >> > >> >> The rudimentaries for this already exist via the DMAPI interface in GPFS >> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS >> DeveloperWorks forum: >> >> On 1/3/11 10:27 AM, dWForums wrote: >> > Author: >> > AlokK.Dhir >> > >> > Message: >> > We have a proof of concept which uses DMAPI to listens to and passively >> logs filesystem changes with a non blocking listener. This log can be used >> to generate backup sets etc. Unfortunately, a bug in the current DMAPI >> keeps this approach from working in the case of certain events. I am told >> 3.4.0.3 may contain a fix. We will gladly share the code once it is >> working. >> >> -Phil >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) >> only and may contain proprietary, confidential or privileged information. >> If you are not the intended recipient, you are hereby notified that any >> review, dissemination or copying of this email is strictly prohibited, and >> to please notify the sender immediately and destroy this email and any >> attachments. Email transmission cannot be guaranteed to be secure or >> error-free. The Company, therefore, does not make any guarantees as to the >> completeness or accuracy of this email or any attachments. This email is >> for informational purposes only and does not constitute a recommendation, >> offer, request or solicitation of any kind to buy, sell, subscribe, redeem >> or perform any type of transaction of a financial product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Fri Oct 10 17:02:09 2014 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 10 Oct 2014 16:02:09 +0000 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? In-Reply-To: <54367964.1050900@ebi.ac.uk> References: <54367964.1050900@ebi.ac.uk> Message-ID: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> Hi Salvatore, We've done this before (non-shared metadata NSDs with GPFS 4.1) and noted these constraints: * Filesystem descriptor quorum: since it will be easier to have a metadata disk go offline, it's even more important to have three failure groups with FusionIO metadata NSDs in two, and at least a desc_only NSD in the third one. You may even want to explore having three full metadata replicas on FusionIO. (Or perhaps if your workload can tolerate it the third one can be slower but in another GPFS "subnet" so that it isn't used for reads.) * Make sure to set the correct default metadata replicas in your filesystem, corresponding to the number of metadata failure groups you set up. When a metadata server goes offline, it will take the metadata disks with it, and you want a replica of the metadata to be available. * When a metadata server goes offline and comes back up (after a maintenance reboot, for example), the non-shared metadata disks will be stopped. Until those are brought back into a well-known replicated state, you are at risk of a cluster-wide filesystem unmount if there is a subsequent metadata disk failure. But GPFS will continue to work, by default, allowing reads and writes against the remaining metadata replica. You must detect that disks are stopped (e.g. mmlsdisk) and restart them (e.g. with mmchdisk start ?a). I haven't seen anyone "recommend" running non-shared disk like this, and I wouldn't do this for things which can't afford to go offline unexpectedly and require a little more operational attention. But it does appear to work. Thx Paul Sanchez From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Salvatore Di Nardo Sent: Thursday, October 09, 2014 8:03 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? Hello everyone, Suppose we want to build a new GPFS storage using SAN attached storages, but instead to put metadata in a shared storage, we want to use FusionIO PCI cards locally on the servers to speed up metadata operation( http://www.fusionio.com/products/iodrive) and for reliability, replicate the metadata in all the servers, will this work in case of server failure? To make it more clear: If a server fail i will loose also a metadata vdisk. Its the replica mechanism its reliable enough to avoid metadata corruption and loss of data? Thanks in advance Salvatore Di Nardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Oct 10 17:05:03 2014 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 10 Oct 2014 11:05:03 -0500 Subject: [gpfsug-discuss] GPFS File Heat Message-ID: As Sven suggests, this is easy to gather once you turn on file heat. I run this heat.pol file against a file systems to gather the values: -- heat.pol -- define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) END]) rule fh1 external list 'fh' exec '' rule fh2 list 'fh' weight(FILE_HEAT) show( DISPLAY_NULL(FILE_HEAT) || '|' || varchar(file_size) ) -- heat.pol -- Produces output similar to this: /gpfs/.../specFile.pyc 535089836 5892 /gpfs/.../syspath.py 528685287 806 /gpfs/---/bwe.py 528160670 4607 Actual GPFS file path redacted :) After that it's a relatively straightforward process to go thru the values. There is no documentation on what the values really mean, but it does give you some overall indication of which files are getting the most hits. I have other information to share; drop me a note at my work email: robert.oesterlin at nuance.com Bob Oesterlin Sr Storage Engineer, Nuance Communications -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdeluca at gmail.com Fri Oct 10 17:09:49 2014 From: bdeluca at gmail.com (Ben De Luca) Date: Sat, 11 Oct 2014 00:09:49 +0800 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme wrote: > Ben, > > to get lists of 'Hot Files' turn File Heat on , some discussion about it > is here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 > > thx. Sven > > > On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca wrote: > >> Id like this to see hot files >> >> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister < >> bbanister at jumptrading.com> wrote: >> >>> Hmm... I didn't think to use the DMAPI interface. That could be a nice >>> option. Has anybody done this already and are there any examples we could >>> look at? >>> >>> Thanks! >>> -Bryan >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at gpfsug.org [mailto: >>> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri >>> Sent: Friday, October 10, 2014 10:04 AM >>> To: gpfsug main discussion list >>> Subject: Re: [gpfsug-discuss] GPFS RFE promotion >>> >>> On 10/9/14 3:31 PM, Bryan Banister wrote: >>> > >>> > Just wanted to pass my GPFS RFE along: >>> > >>> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 >>> > 0458 >>> > >>> > >>> > *Description*: >>> > >>> > GPFS File System Manager should provide the option to log all file and >>> > directory operations that occur in a file system, preferably stored in >>> > a TSD (Time Series Database) that could be quickly queried through an >>> > API interface and command line tools. ... >>> > >>> >>> The rudimentaries for this already exist via the DMAPI interface in GPFS >>> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS >>> DeveloperWorks forum: >>> >>> On 1/3/11 10:27 AM, dWForums wrote: >>> > Author: >>> > AlokK.Dhir >>> > >>> > Message: >>> > We have a proof of concept which uses DMAPI to listens to and >>> passively logs filesystem changes with a non blocking listener. This log >>> can be used to generate backup sets etc. Unfortunately, a bug in the >>> current DMAPI keeps this approach from working in the case of certain >>> events. I am told 3.4.0.3 may contain a fix. We will gladly share the >>> code once it is working. >>> >>> -Phil >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) >>> only and may contain proprietary, confidential or privileged information. >>> If you are not the intended recipient, you are hereby notified that any >>> review, dissemination or copying of this email is strictly prohibited, and >>> to please notify the sender immediately and destroy this email and any >>> attachments. Email transmission cannot be guaranteed to be secure or >>> error-free. The Company, therefore, does not make any guarantees as to the >>> completeness or accuracy of this email or any attachments. This email is >>> for informational purposes only and does not constitute a recommendation, >>> offer, request or solicitation of any kind to buy, sell, subscribe, redeem >>> or perform any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 10 17:15:22 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Oct 2014 16:15:22 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> I agree with Ben, I think. I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Fri Oct 10 17:24:32 2014 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 10 Oct 2014 16:24:32 +0000 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <201D6001C896B846A9CFC2E841986AC1451878D2@mailnycmb2a.winmail.deshaw.com> We've been mounting all filesystems in a canonical location and bind mounting filesets into the namespace. One gotcha that we recently encountered though was the selection of /gpfs as the root of the canonical mount path. (By default automountdir is set to /gpfs/automountdir, which made this seem like a good spot.) This seems to be where gpfs expects filesystems to be mounted, since there are some hardcoded references in the gpfs.base RPM %pre script (RHEL package for GPFS) which try to nudge processes off of the filesystems before yanking the mounts during an RPM version upgrade. This however may take an exceedingly long time, since it's doing an 'lsof +D /gpfs' which walks the filesystems. -Paul Sanchez -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley Sent: Tuesday, September 23, 2014 11:47 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] filesets and mountpoint naming When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Fri Oct 10 17:52:27 2014 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 10 Oct 2014 09:52:27 -0700 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister wrote: > I agree with Ben, I think. > > > > I don?t want to use the ILM policy engine as that puts a direct workload > against the metadata storage and server resources. We need something > out-of-band, out of the file system operational path. > > > > Is there a simple DMAPI daemon that would log the file system namespace > changes that we could use? > > > > If so are there any limitations? > > > > And is it possible to set this up in an HA environment? > > > > Thanks! > > -Bryan > > > > *From:* gpfsug-discuss-bounces at gpfsug.org [mailto: > gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Ben De Luca > *Sent:* Friday, October 10, 2014 11:10 AM > > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > > > querying this through the policy engine is far to late to do any thing > useful with it > > > > On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme wrote: > > Ben, > > > > to get lists of 'Hot Files' turn File Heat on , some discussion about it > is here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 > > > > thx. Sven > > > > > > On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca wrote: > > Id like this to see hot files > > > > On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister < > bbanister at jumptrading.com> wrote: > > Hmm... I didn't think to use the DMAPI interface. That could be a nice > option. Has anybody done this already and are there any examples we could > look at? > > Thanks! > -Bryan > > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto: > gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri > Sent: Friday, October 10, 2014 10:04 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS RFE promotion > > On 10/9/14 3:31 PM, Bryan Banister wrote: > > > > Just wanted to pass my GPFS RFE along: > > > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > > 0458 > > > > > > *Description*: > > > > GPFS File System Manager should provide the option to log all file and > > directory operations that occur in a file system, preferably stored in > > a TSD (Time Series Database) that could be quickly queried through an > > API interface and command line tools. ... > > > > The rudimentaries for this already exist via the DMAPI interface in GPFS > (used by the TSM HSM product). A while ago this was posted to the IBM GPFS > DeveloperWorks forum: > > On 1/3/11 10:27 AM, dWForums wrote: > > Author: > > AlokK.Dhir > > > > Message: > > We have a proof of concept which uses DMAPI to listens to and passively > logs filesystem changes with a non blocking listener. This log can be used > to generate backup sets etc. Unfortunately, a bug in the current DMAPI > keeps this approach from working in the case of certain events. I am told > 3.4.0.3 may contain a fix. We will gladly share the code once it is > working. > > -Phil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 10 18:13:16 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Oct 2014 17:13:16 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted. I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable. I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!) Thanks, -Bryan Ps. Hajo said that he couldn?t access the RFE to vote on it: I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo Here is what the RFE website states: Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 A unique URL that you can bookmark and share with others. From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Friday, October 10, 2014 11:52 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > wrote: I agree with Ben, I think. I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Sat Oct 11 10:37:10 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Sat, 11 Oct 2014 10:37:10 +0100 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> References: <54367964.1050900@ebi.ac.uk> <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> Message-ID: <5438FA46.7090902@ebi.ac.uk> Thanks for your answer. Yes, the idea is to have 3 servers in 3 different failure groups. Each of them with a drive and set 3 metadata replica as the default one. I have not considered that the vdisks could be off after a 'reboot' or failure, so that's a good point, but anyway , after a failure or even a standard reboot, the server and the cluster have to be checked anyway, and i always check the vdisk status, so no big deal. Your answer made me consider also another thing... Once put them back online, they will be restriped automatically or should i run every time 'mmrestripefs' to verify/correct the replicas? I understand that use lodal disk sound strange, infact our first idea was just to add some ssd to the shared storage, but then we considered that the sas cable could be a huge bottleneck. The cost difference is not huge and the fusioio locally on the server would make the metadata just fly. On 10/10/14 17:02, Sanchez, Paul wrote: > > Hi Salvatore, > > We've done this before (non-shared metadata NSDs with GPFS 4.1) and > noted these constraints: > > * Filesystem descriptor quorum: since it will be easier to have a > metadata disk go offline, it's even more important to have three > failure groups with FusionIO metadata NSDs in two, and at least a > desc_only NSD in the third one. You may even want to explore having > three full metadata replicas on FusionIO. (Or perhaps if your workload > can tolerate it the third one can be slower but in another GPFS > "subnet" so that it isn't used for reads.) > > * Make sure to set the correct default metadata replicas in your > filesystem, corresponding to the number of metadata failure groups you > set up. When a metadata server goes offline, it will take the metadata > disks with it, and you want a replica of the metadata to be available. > > * When a metadata server goes offline and comes back up (after a > maintenance reboot, for example), the non-shared metadata disks will > be stopped. Until those are brought back into a well-known replicated > state, you are at risk of a cluster-wide filesystem unmount if there > is a subsequent metadata disk failure. But GPFS will continue to work, > by default, allowing reads and writes against the remaining metadata > replica. You must detect that disks are stopped (e.g. mmlsdisk) and > restart them (e.g. with mmchdisk start ?a). > > I haven't seen anyone "recommend" running non-shared disk like this, > and I wouldn't do this for things which can't afford to go offline > unexpectedly and require a little more operational attention. But it > does appear to work. > > Thx > Paul Sanchez > > *From:*gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di > Nardo > *Sent:* Thursday, October 09, 2014 8:03 AM > *To:* gpfsug main discussion list > *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable? > > Hello everyone, > > Suppose we want to build a new GPFS storage using SAN attached > storages, but instead to put metadata in a shared storage, we want to > use FusionIO PCI cards locally on the servers to speed up metadata > operation( http://www.fusionio.com/products/iodrive) and for > reliability, replicate the metadata in all the servers, will this work > in case of server failure? > > To make it more clear: If a server fail i will loose also a metadata > vdisk. Its the replica mechanism its reliable enough to avoid metadata > corruption and loss of data? > > Thanks in advance > Salvatore Di Nardo > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From service at metamodul.com Sun Oct 12 17:03:56 2014 From: service at metamodul.com (MetaService) Date: Sun, 12 Oct 2014 18:03:56 +0200 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <1413129836.4846.9.camel@titan> My preferred naming convention is to use the cluster name or part of it as the base directory for all GPFS mounts. Example: Clustername=c1_eum would mean that: /c1_eum/ would be the base directory for all Cluster c1_eum GPFSs In case a second local cluster would exist its root mount point would be /c2_eum/ Even in case of mounting remote clusters a naming collision is not very likely. BTW: For accessing the the final directories /.../scratch ... the user should not rely on the mount points but on given variables provided. CLS_HOME=/... CLS_SCRATCH=/.... hth Hajo From lhorrocks-barlow at ocf.co.uk Fri Oct 10 17:48:24 2014 From: lhorrocks-barlow at ocf.co.uk (Laurence Horrocks- Barlow) Date: Fri, 10 Oct 2014 17:48:24 +0100 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> References: <54367964.1050900@ebi.ac.uk> <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> Message-ID: <54380DD8.2020909@ocf.co.uk> Hi Salvatore, Just to add that when the local metadata disk fails or the server goes offline there will most likely be an I/O interruption/pause whist the GPFS cluster renegotiates. The main concept to be aware of (as Paul mentioned) is that when a disk goes offline it will appear down to GPFS, once you've started the disk again it will rediscover and scan the metadata for any missing updates, these updates are then repaired/replicated again. Laurence Horrocks-Barlow Linux Systems Software Engineer OCF plc Tel: +44 (0)114 257 2200 Fax: +44 (0)114 257 0022 Web: www.ocf.co.uk Blog: blog.ocf.co.uk Twitter: @ocfplc OCF plc is a company registered in England and Wales. Registered number 4132533, VAT number GB 780 6803 14. Registered office address: OCF plc, 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 2PG. This message is private and confidential. If you have received this message in error, please notify us and remove it from your system. On 10/10/2014 17:02, Sanchez, Paul wrote: > > Hi Salvatore, > > We've done this before (non-shared metadata NSDs with GPFS 4.1) and > noted these constraints: > > * Filesystem descriptor quorum: since it will be easier to have a > metadata disk go offline, it's even more important to have three > failure groups with FusionIO metadata NSDs in two, and at least a > desc_only NSD in the third one. You may even want to explore having > three full metadata replicas on FusionIO. (Or perhaps if your workload > can tolerate it the third one can be slower but in another GPFS > "subnet" so that it isn't used for reads.) > > * Make sure to set the correct default metadata replicas in your > filesystem, corresponding to the number of metadata failure groups you > set up. When a metadata server goes offline, it will take the metadata > disks with it, and you want a replica of the metadata to be available. > > * When a metadata server goes offline and comes back up (after a > maintenance reboot, for example), the non-shared metadata disks will > be stopped. Until those are brought back into a well-known replicated > state, you are at risk of a cluster-wide filesystem unmount if there > is a subsequent metadata disk failure. But GPFS will continue to work, > by default, allowing reads and writes against the remaining metadata > replica. You must detect that disks are stopped (e.g. mmlsdisk) and > restart them (e.g. with mmchdisk start ?a). > > I haven't seen anyone "recommend" running non-shared disk like this, > and I wouldn't do this for things which can't afford to go offline > unexpectedly and require a little more operational attention. But it > does appear to work. > > Thx > Paul Sanchez > > *From:*gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di > Nardo > *Sent:* Thursday, October 09, 2014 8:03 AM > *To:* gpfsug main discussion list > *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable? > > Hello everyone, > > Suppose we want to build a new GPFS storage using SAN attached > storages, but instead to put metadata in a shared storage, we want to > use FusionIO PCI cards locally on the servers to speed up metadata > operation( http://www.fusionio.com/products/iodrive) and for > reliability, replicate the metadata in all the servers, will this work > in case of server failure? > > To make it more clear: If a server fail i will loose also a metadata > vdisk. Its the replica mechanism its reliable enough to avoid metadata > corruption and loss of data? > > Thanks in advance > Salvatore Di Nardo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: lhorrocks-barlow.vcf Type: text/x-vcard Size: 388 bytes Desc: not available URL: From kraemerf at de.ibm.com Mon Oct 13 12:10:17 2014 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Mon, 13 Oct 2014 13:10:17 +0200 Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany Message-ID: GPFS at LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany Oct 14th 11:15-12:05 Room 18 http://sched.co/1uMYEWK Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany From service at metamodul.com Mon Oct 13 16:49:44 2014 From: service at metamodul.com (service at metamodul.com) Date: Mon, 13 Oct 2014 17:49:44 +0200 (CEST) Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany In-Reply-To: References: Message-ID: <994787708.574787.1413215384447.JavaMail.open-xchange@oxbaltgw12.schlund.de> Hallo Frank, the announcement is a little bit to late for me. Would be nice if you could share your speech later. cheers Hajo -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Oct 14 15:39:35 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Oct 2014 15:39:35 +0100 Subject: [gpfsug-discuss] wait for permission to append to log Message-ID: <543D35A7.7080800@ebi.ac.uk> hello all, could someone explain me the meaning of those waiters? gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' Does it means that the vdisk logs are struggling? Regards, Salvatore From oehmes at us.ibm.com Tue Oct 14 15:51:10 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 07:51:10 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: <543D35A7.7080800@ebi.ac.uk> References: <543D35A7.7080800@ebi.ac.uk> Message-ID: it means there is contention on inserting data into the fast write log on the GSS Node, which could be config or workload related what GSS code version are you running and how are the nodes connected with each other (Ethernet or IB) ? ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Salvatore Di Nardo To: gpfsug main discussion list Date: 10/14/2014 07:40 AM Subject: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org hello all, could someone explain me the meaning of those waiters? gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' Does it means that the vdisk logs are struggling? Regards, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Oct 14 16:23:01 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Oct 2014 16:23:01 +0100 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> Message-ID: <543D3FD5.1060705@ebi.ac.uk> On 14/10/14 15:51, Sven Oehme wrote: > it means there is contention on inserting data into the fast write log > on the GSS Node, which could be config or workload related > what GSS code version are you running [root at ebi5-251 ~]# mmdiag --version === mmdiag: version === Current GPFS build: "3.5.0-11 efix1 (888041)". Built on Jul 9 2013 at 18:03:32 Running 6 days 2 hours 10 minutes 35 secs > and how are the nodes connected with each other (Ethernet or IB) ? ethernet. they use the same bonding (4x10Gb/s) where the data is passing. We don't have admin dedicated network [root at gss03a ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager *Note:* The 3 node "pairs" (gss01, gss02 and gss03) are in different subnet because of datacenter constraints ( They are not physically in the same row, and due to network constraints was not possible to put them in the same subnet). The packets are routed, but should not be a problem as there is 160Gb/s bandwidth between them. Regards, Salvatore > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug main discussion list > Date: 10/14/2014 07:40 AM > Subject: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > ------------------------------------------------------------------------ > > > > hello all, > could someone explain me the meaning of those waiters? > > gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > > Does it means that the vdisk logs are struggling? > > Regards, > Salvatore > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Tue Oct 14 17:22:41 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 09:22:41 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: <543D3FD5.1060705@ebi.ac.uk> References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> Message-ID: your GSS code version is very backlevel. can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk as well as mmlsconfig and mmlsfs all thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Salvatore Di Nardo To: gpfsug-discuss at gpfsug.org Date: 10/14/2014 08:23 AM Subject: Re: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org On 14/10/14 15:51, Sven Oehme wrote: it means there is contention on inserting data into the fast write log on the GSS Node, which could be config or workload related what GSS code version are you running [root at ebi5-251 ~]# mmdiag --version === mmdiag: version === Current GPFS build: "3.5.0-11 efix1 (888041)". Built on Jul 9 2013 at 18:03:32 Running 6 days 2 hours 10 minutes 35 secs and how are the nodes connected with each other (Ethernet or IB) ? ethernet. they use the same bonding (4x10Gb/s) where the data is passing. We don't have admin dedicated network [root at gss03a ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different subnet because of datacenter constraints ( They are not physically in the same row, and due to network constraints was not possible to put them in the same subnet). The packets are routed, but should not be a problem as there is 160Gb/s bandwidth between them. Regards, Salvatore ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Salvatore Di Nardo To: gpfsug main discussion list Date: 10/14/2014 07:40 AM Subject: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org hello all, could someone explain me the meaning of those waiters? gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' Does it means that the vdisk logs are struggling? Regards, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Oct 14 17:39:18 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Oct 2014 17:39:18 +0100 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> Message-ID: <543D51B6.3070602@ebi.ac.uk> Thanks in advance for your help. We have 6 RG: recovery group vdisks vdisks servers ------------------ ----------- ------ ------- gss01a 4 8 gss01a.ebi.ac.uk,gss01b.ebi.ac.uk gss01b 4 8 gss01b.ebi.ac.uk,gss01a.ebi.ac.uk gss02a 4 8 gss02a.ebi.ac.uk,gss02b.ebi.ac.uk gss02b 4 8 gss02b.ebi.ac.uk,gss02a.ebi.ac.uk gss03a 4 8 gss03a.ebi.ac.uk,gss03b.ebi.ac.uk gss03b 4 8 gss03b.ebi.ac.uk,gss03a.ebi.ac.uk Check the attached file for RG details. Following mmlsconfig: [root at gss01a ~]# mmlsconfig Configuration data for cluster GSS.ebi.ac.uk: --------------------------------------------- myNodeConfigNumber 1 clusterName GSS.ebi.ac.uk clusterId 17987981184946329605 autoload no dmapiFileHandleSize 32 minReleaseLevel 3.5.0.11 [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] pagepool 38g nsdRAIDBufferPoolSizePct 80 maxBufferDescs 2m numaMemoryInterleave yes prefetchPct 5 maxblocksize 16m nsdRAIDTracks 128k ioHistorySize 64k nsdRAIDSmallBufferSize 256k nsdMaxWorkerThreads 3k nsdMinWorkerThreads 3k nsdRAIDSmallThreadRatio 2 nsdRAIDThreadsPerQueue 16 nsdClientCksumTypeLocal ck64 nsdClientCksumTypeRemote ck64 nsdRAIDEventLogToConsole all nsdRAIDFastWriteFSDataLimit 64k nsdRAIDFastWriteFSMetadataLimit 256k nsdRAIDReconstructAggressiveness 1 nsdRAIDFlusherBuffersLowWatermarkPct 20 nsdRAIDFlusherBuffersLimitPct 80 nsdRAIDFlusherTracksLowWatermarkPct 20 nsdRAIDFlusherTracksLimitPct 80 nsdRAIDFlusherFWLogHighWatermarkMB 1000 nsdRAIDFlusherFWLogLimitMB 5000 nsdRAIDFlusherThreadsLowWatermark 1 nsdRAIDFlusherThreadsHighWatermark 512 nsdRAIDBlockDeviceMaxSectorsKB 4096 nsdRAIDBlockDeviceNrRequests 32 nsdRAIDBlockDeviceQueueDepth 16 nsdRAIDBlockDeviceScheduler deadline nsdRAIDMaxTransientStale2FT 1 nsdRAIDMaxTransientStale3FT 1 syncWorkerThreads 256 tscWorkerPool 64 nsdInlineWriteMax 32k maxFilesToCache 12k maxStatCache 512 maxGeneralThreads 1280 flushedDataTarget 1024 flushedInodeTarget 1024 maxFileCleaners 1024 maxBufferCleaners 1024 logBufferCount 20 logWrapAmountPct 2 logWrapThreads 128 maxAllocRegionsPerNode 32 maxBackgroundDeletionThreads 16 maxInodeDeallocPrefetch 128 maxMBpS 16000 maxReceiverThreads 128 worker1Threads 1024 worker3Threads 32 [common] cipherList AUTHONLY socketMaxListenConnections 1500 failureDetectionTime 60 [common] adminMode central File systems in cluster GSS.ebi.ac.uk: -------------------------------------- /dev/gpfs1 For more configuration paramenters i also attached a file with the complete output of mmdiag --config. and mmlsfs: File system attributes for /dev/gpfs1: ====================================== flag value description ------------------- ------------------------ ----------------------------------- -f 32768 Minimum fragment size in bytes (system pool) 262144 Minimum fragment size in bytes (other pools) -i 512 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1000 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 8388608 Block size (other pools) -Q user;group;fileset Quotas enforced user;group;fileset Default quotas enabled --filesetdf no Fileset df enabled? -V 13.23 (3.5.0.7) File system version --create-time Tue Mar 18 16:01:24 2014 File system creation time -u yes Support for large LUNs? -z no Is DMAPI enabled? -L 4194304 Logfile size -E yes Exact mtime mount option -S yes Suppress atime mount option -K whenpossible Strict replica allocation option --fastea yes Fast external attributes enabled? --inode-limit 134217728 Maximum number of inodes -P system;data Disk storage pools in file system -d gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; -d gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; -d gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; -d gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; -d gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 Disks in file system --perfileset-quota no Per-fileset quota enforcement -A yes Automatic mount option -o none Additional mount options -T /gpfs1 Default mount point --mount-priority 0 Mount priority Regards, Salvatore On 14/10/14 17:22, Sven Oehme wrote: > your GSS code version is very backlevel. > > can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk > as well as mmlsconfig and mmlsfs all > > thx. Sven > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug-discuss at gpfsug.org > Date: 10/14/2014 08:23 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > ------------------------------------------------------------------------ > > > > > On 14/10/14 15:51, Sven Oehme wrote: > it means there is contention on inserting data into the fast write log > on the GSS Node, which could be config or workload related > what GSS code version are you running > [root at ebi5-251 ~]# mmdiag --version > > === mmdiag: version === > Current GPFS build: "3.5.0-11 efix1 (888041)". > Built on Jul 9 2013 at 18:03:32 > Running 6 days 2 hours 10 minutes 35 secs > > > > and how are the nodes connected with each other (Ethernet or IB) ? > ethernet. they use the same bonding (4x10Gb/s) where the data is > passing. We don't have admin dedicated network > > [root at gss03a ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager > > > *Note:* The 3 node "pairs" (gss01, gss02 and gss03) are in different > subnet because of datacenter constraints ( They are not physically in > the same row, and due to network constraints was not possible to put > them in the same subnet). The packets are routed, but should not be a > problem as there is 160Gb/s bandwidth between them. > > Regards, > Salvatore > > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: _oehmes at us.ibm.com_ > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo __ > > To: gpfsug main discussion list __ > > Date: 10/14/2014 07:40 AM > Subject: [gpfsug-discuss] wait for permission to append to log > Sent by: _gpfsug-discuss-bounces at gpfsug.org_ > > ------------------------------------------------------------------------ > > > > hello all, > could someone explain me the meaning of those waiters? > > gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > > Does it means that the vdisk logs are struggling? > > Regards, > Salvatore > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss01a 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 42% low DA3 no 2 58 2 1 786 GiB 14 days scrub 4% low DA2 no 2 58 2 1 786 GiB 14 days scrub 4% low DA1 no 3 58 2 1 626 GiB 14 days scrub 59% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s01 2 DA3 110 GiB ok e1d1s02 2 DA2 110 GiB ok e1d1s03log 2 LOG 186 GiB ok e1d1s04 2 DA1 108 GiB ok e1d1s05 2 DA2 110 GiB ok e1d1s06 2 DA3 110 GiB ok e1d2s01 2 DA1 108 GiB ok e1d2s02 2 DA2 110 GiB ok e1d2s03 2 DA3 110 GiB ok e1d2s04 2 DA1 108 GiB ok e1d2s05 2 DA2 110 GiB ok e1d2s06 2 DA3 110 GiB ok e1d3s01 2 DA1 108 GiB ok e1d3s02 2 DA2 110 GiB ok e1d3s03 2 DA3 110 GiB ok e1d3s04 2 DA1 108 GiB ok e1d3s05 2 DA2 110 GiB ok e1d3s06 2 DA3 110 GiB ok e1d4s01 2 DA1 108 GiB ok e1d4s02 2 DA2 110 GiB ok e1d4s03 2 DA3 110 GiB ok e1d4s04 2 DA1 108 GiB ok e1d4s05 2 DA2 110 GiB ok e1d4s06 2 DA3 110 GiB ok e1d5s01 2 DA1 108 GiB ok e1d5s02 2 DA2 110 GiB ok e1d5s03 2 DA3 110 GiB ok e1d5s04 2 DA1 108 GiB ok e1d5s05 2 DA2 110 GiB ok e1d5s06 2 DA3 110 GiB ok e2d1s01 2 DA3 110 GiB ok e2d1s02 2 DA2 110 GiB ok e2d1s03log 2 LOG 186 GiB ok e2d1s04 2 DA1 108 GiB ok e2d1s05 2 DA2 110 GiB ok e2d1s06 2 DA3 110 GiB ok e2d2s01 2 DA1 108 GiB ok e2d2s02 2 DA2 110 GiB ok e2d2s03 2 DA3 110 GiB ok e2d2s04 2 DA1 108 GiB ok e2d2s05 2 DA2 110 GiB ok e2d2s06 2 DA3 110 GiB ok e2d3s01 2 DA1 108 GiB ok e2d3s02 2 DA2 110 GiB ok e2d3s03 2 DA3 110 GiB ok e2d3s04 2 DA1 108 GiB ok e2d3s05 2 DA2 110 GiB ok e2d3s06 2 DA3 110 GiB ok e2d4s01 2 DA1 108 GiB ok e2d4s02 2 DA2 110 GiB ok e2d4s03 2 DA3 110 GiB ok e2d4s04 2 DA1 108 GiB ok e2d4s05 2 DA2 110 GiB ok e2d4s06 2 DA3 110 GiB ok e2d5s01 2 DA1 108 GiB ok e2d5s02 2 DA2 110 GiB ok e2d5s03 2 DA3 110 GiB ok e2d5s04 2 DA1 108 GiB ok e2d5s05 2 DA2 110 GiB ok e2d5s06 2 DA3 110 GiB ok e3d1s01 2 DA1 108 GiB ok e3d1s02 2 DA3 110 GiB ok e3d1s03log 2 LOG 186 GiB ok e3d1s04 2 DA1 108 GiB ok e3d1s05 2 DA2 110 GiB ok e3d1s06 2 DA3 110 GiB ok e3d2s01 2 DA1 108 GiB ok e3d2s02 2 DA2 110 GiB ok e3d2s03 2 DA3 110 GiB ok e3d2s04 2 DA1 108 GiB ok e3d2s05 2 DA2 110 GiB ok e3d2s06 2 DA3 110 GiB ok e3d3s01 2 DA1 108 GiB ok e3d3s02 2 DA2 110 GiB ok e3d3s03 2 DA3 110 GiB ok e3d3s04 2 DA1 108 GiB ok e3d3s05 2 DA2 110 GiB ok e3d3s06 2 DA3 110 GiB ok e3d4s01 2 DA1 108 GiB ok e3d4s02 2 DA2 110 GiB ok e3d4s03 2 DA3 110 GiB ok e3d4s04 2 DA1 108 GiB ok e3d4s05 2 DA2 110 GiB ok e3d4s06 2 DA3 110 GiB ok e3d5s01 2 DA1 108 GiB ok e3d5s02 2 DA2 110 GiB ok e3d5s03 2 DA3 110 GiB ok e3d5s04 2 DA1 108 GiB ok e3d5s05 2 DA2 110 GiB ok e3d5s06 2 DA3 110 GiB ok e4d1s01 2 DA1 108 GiB ok e4d1s02 2 DA3 110 GiB ok e4d1s04 2 DA1 108 GiB ok e4d1s05 2 DA2 110 GiB ok e4d1s06 2 DA3 110 GiB ok e4d2s01 2 DA1 108 GiB ok e4d2s02 2 DA2 110 GiB ok e4d2s03 2 DA3 110 GiB ok e4d2s04 2 DA1 106 GiB ok e4d2s05 2 DA2 110 GiB ok e4d2s06 2 DA3 110 GiB ok e4d3s01 2 DA1 106 GiB ok e4d3s02 2 DA2 110 GiB ok e4d3s03 2 DA3 110 GiB ok e4d3s04 2 DA1 106 GiB ok e4d3s05 2 DA2 110 GiB ok e4d3s06 2 DA3 110 GiB ok e4d4s01 2 DA1 106 GiB ok e4d4s02 2 DA2 110 GiB ok e4d4s03 2 DA3 110 GiB ok e4d4s04 2 DA1 106 GiB ok e4d4s05 2 DA2 110 GiB ok e4d4s06 2 DA3 110 GiB ok e4d5s01 2 DA1 106 GiB ok e4d5s02 2 DA2 110 GiB ok e4d5s03 2 DA3 110 GiB ok e4d5s04 2 DA1 106 GiB ok e4d5s05 2 DA2 110 GiB ok e4d5s06 2 DA3 110 GiB ok e5d1s01 2 DA1 106 GiB ok e5d1s02 2 DA2 110 GiB ok e5d1s04 2 DA1 106 GiB ok e5d1s05 2 DA2 110 GiB ok e5d1s06 2 DA3 110 GiB ok e5d2s01 2 DA1 106 GiB ok e5d2s02 2 DA2 110 GiB ok e5d2s03 2 DA3 110 GiB ok e5d2s04 2 DA1 106 GiB ok e5d2s05 2 DA2 110 GiB ok e5d2s06 2 DA3 110 GiB ok e5d3s01 2 DA1 106 GiB ok e5d3s02 2 DA2 110 GiB ok e5d3s03 2 DA3 110 GiB ok e5d3s04 2 DA1 106 GiB ok e5d3s05 2 DA2 110 GiB ok e5d3s06 2 DA3 110 GiB ok e5d4s01 2 DA1 106 GiB ok e5d4s02 2 DA2 110 GiB ok e5d4s03 2 DA3 110 GiB ok e5d4s04 2 DA1 106 GiB ok e5d4s05 2 DA2 110 GiB ok e5d4s06 2 DA3 110 GiB ok e5d5s01 2 DA1 106 GiB ok e5d5s02 2 DA2 110 GiB ok e5d5s03 2 DA3 110 GiB ok e5d5s04 2 DA1 106 GiB ok e5d5s05 2 DA2 110 GiB ok e5d5s06 2 DA3 110 GiB ok e6d1s01 2 DA1 106 GiB ok e6d1s02 2 DA2 110 GiB ok e6d1s04 2 DA1 106 GiB ok e6d1s05 2 DA2 110 GiB ok e6d1s06 2 DA3 110 GiB ok e6d2s01 2 DA1 106 GiB ok e6d2s02 2 DA2 110 GiB ok e6d2s03 2 DA3 110 GiB ok e6d2s04 2 DA1 106 GiB ok e6d2s05 2 DA2 110 GiB ok e6d2s06 2 DA3 110 GiB ok e6d3s01 2 DA1 106 GiB ok e6d3s02 2 DA2 110 GiB ok e6d3s03 2 DA3 110 GiB ok e6d3s04 2 DA1 106 GiB ok e6d3s05 2 DA2 108 GiB ok e6d3s06 2 DA3 108 GiB ok e6d4s01 2 DA1 106 GiB ok e6d4s02 2 DA2 108 GiB ok e6d4s03 2 DA3 108 GiB ok e6d4s04 2 DA1 106 GiB ok e6d4s05 2 DA2 108 GiB ok e6d4s06 2 DA3 108 GiB ok e6d5s01 2 DA1 106 GiB ok e6d5s02 2 DA2 108 GiB ok e6d5s03 2 DA3 108 GiB ok e6d5s04 2 DA1 106 GiB ok e6d5s05 2 DA2 108 GiB ok e6d5s06 2 DA3 108 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss01a_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss01a_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss01a_MetaData_8M_3p_1 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss01a_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss01a_MetaData_8M_3p_3 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss01a_Data_8M_3p_1 8+3p DA3 99 TiB 8 MiB 32 KiB gss01a_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss01a_Data_8M_3p_3 8+3p DA1 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss01a.ebi.ac.uk gss01a.ebi.ac.uk,gss01b.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss01b 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 36% low DA1 no 3 58 2 1 626 GiB 14 days scrub 61% low DA2 no 2 58 2 1 786 GiB 14 days scrub 68% low DA3 no 2 58 2 1 786 GiB 14 days scrub 70% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s07 2 DA1 108 GiB ok e1d1s08 2 DA2 110 GiB ok e1d1s09 2 DA3 110 GiB ok e1d1s10 2 DA1 108 GiB ok e1d1s11 2 DA2 110 GiB ok e1d1s12 2 DA3 110 GiB ok e1d2s07 2 DA1 108 GiB ok e1d2s08 2 DA2 110 GiB ok e1d2s09 2 DA3 110 GiB ok e1d2s10 2 DA1 108 GiB ok e1d2s11 2 DA2 110 GiB ok e1d2s12 2 DA3 110 GiB ok e1d3s07 2 DA1 108 GiB ok e1d3s08 2 DA2 110 GiB ok e1d3s09 2 DA3 110 GiB ok e1d3s10 2 DA1 108 GiB ok e1d3s11 2 DA2 110 GiB ok e1d3s12 2 DA3 110 GiB ok e1d4s07 2 DA1 108 GiB ok e1d4s08 2 DA2 110 GiB ok e1d4s09 2 DA3 110 GiB ok e1d4s10 2 DA1 108 GiB ok e1d4s11 2 DA2 110 GiB ok e1d4s12 2 DA3 110 GiB ok e1d5s07 2 DA1 108 GiB ok e1d5s08 2 DA2 110 GiB ok e1d5s09 2 DA3 110 GiB ok e1d5s10 2 DA3 110 GiB ok e1d5s11 2 DA2 110 GiB ok e1d5s12log 2 LOG 186 GiB ok e2d1s07 2 DA1 106 GiB ok e2d1s08 2 DA2 110 GiB ok e2d1s09 2 DA3 110 GiB ok e2d1s10 2 DA1 108 GiB ok e2d1s11 2 DA2 110 GiB ok e2d1s12 2 DA3 110 GiB ok e2d2s07 2 DA1 108 GiB ok e2d2s08 2 DA2 110 GiB ok e2d2s09 2 DA3 110 GiB ok e2d2s10 2 DA1 108 GiB ok e2d2s11 2 DA2 110 GiB ok e2d2s12 2 DA3 110 GiB ok e2d3s07 2 DA1 108 GiB ok e2d3s08 2 DA2 110 GiB ok e2d3s09 2 DA3 110 GiB ok e2d3s10 2 DA1 108 GiB ok e2d3s11 2 DA2 110 GiB ok e2d3s12 2 DA3 110 GiB ok e2d4s07 2 DA1 108 GiB ok e2d4s08 2 DA2 110 GiB ok e2d4s09 2 DA3 110 GiB ok e2d4s10 2 DA1 108 GiB ok e2d4s11 2 DA2 110 GiB ok e2d4s12 2 DA3 110 GiB ok e2d5s07 2 DA1 108 GiB ok e2d5s08 2 DA2 110 GiB ok e2d5s09 2 DA3 110 GiB ok e2d5s10 2 DA3 110 GiB ok e2d5s11 2 DA2 110 GiB ok e2d5s12log 2 LOG 186 GiB ok e3d1s07 2 DA1 108 GiB ok e3d1s08 2 DA2 110 GiB ok e3d1s09 2 DA3 110 GiB ok e3d1s10 2 DA1 108 GiB ok e3d1s11 2 DA2 110 GiB ok e3d1s12 2 DA3 110 GiB ok e3d2s07 2 DA1 108 GiB ok e3d2s08 2 DA2 110 GiB ok e3d2s09 2 DA3 110 GiB ok e3d2s10 2 DA1 108 GiB ok e3d2s11 2 DA2 110 GiB ok e3d2s12 2 DA3 110 GiB ok e3d3s07 2 DA1 108 GiB ok e3d3s08 2 DA2 110 GiB ok e3d3s09 2 DA3 110 GiB ok e3d3s10 2 DA1 108 GiB ok e3d3s11 2 DA2 110 GiB ok e3d3s12 2 DA3 110 GiB ok e3d4s07 2 DA1 108 GiB ok e3d4s08 2 DA2 110 GiB ok e3d4s09 2 DA3 110 GiB ok e3d4s10 2 DA1 108 GiB ok e3d4s11 2 DA2 110 GiB ok e3d4s12 2 DA3 110 GiB ok e3d5s07 2 DA1 108 GiB ok e3d5s08 2 DA2 110 GiB ok e3d5s09 2 DA3 110 GiB ok e3d5s10 2 DA1 108 GiB ok e3d5s11 2 DA3 110 GiB ok e3d5s12log 2 LOG 186 GiB ok e4d1s07 2 DA1 108 GiB ok e4d1s08 2 DA2 110 GiB ok e4d1s09 2 DA3 110 GiB ok e4d1s10 2 DA1 108 GiB ok e4d1s11 2 DA2 110 GiB ok e4d1s12 2 DA3 110 GiB ok e4d2s07 2 DA1 108 GiB ok e4d2s08 2 DA2 110 GiB ok e4d2s09 2 DA3 110 GiB ok e4d2s10 2 DA1 108 GiB ok e4d2s11 2 DA2 110 GiB ok e4d2s12 2 DA3 110 GiB ok e4d3s07 2 DA1 106 GiB ok e4d3s08 2 DA2 110 GiB ok e4d3s09 2 DA3 110 GiB ok e4d3s10 2 DA1 106 GiB ok e4d3s11 2 DA2 110 GiB ok e4d3s12 2 DA3 110 GiB ok e4d4s07 2 DA1 106 GiB ok e4d4s08 2 DA2 110 GiB ok e4d4s09 2 DA3 110 GiB ok e4d4s10 2 DA1 106 GiB ok e4d4s11 2 DA2 110 GiB ok e4d4s12 2 DA3 110 GiB ok e4d5s07 2 DA1 106 GiB ok e4d5s08 2 DA2 110 GiB ok e4d5s09 2 DA3 110 GiB ok e4d5s10 2 DA1 106 GiB ok e4d5s11 2 DA3 110 GiB ok e5d1s07 2 DA1 106 GiB ok e5d1s08 2 DA2 110 GiB ok e5d1s09 2 DA3 110 GiB ok e5d1s10 2 DA1 106 GiB ok e5d1s11 2 DA2 110 GiB ok e5d1s12 2 DA3 110 GiB ok e5d2s07 2 DA1 106 GiB ok e5d2s08 2 DA2 110 GiB ok e5d2s09 2 DA3 110 GiB ok e5d2s10 2 DA1 106 GiB ok e5d2s11 2 DA2 110 GiB ok e5d2s12 2 DA3 110 GiB ok e5d3s07 2 DA1 106 GiB ok e5d3s08 2 DA2 110 GiB ok e5d3s09 2 DA3 110 GiB ok e5d3s10 2 DA1 106 GiB ok e5d3s11 2 DA2 110 GiB ok e5d3s12 2 DA3 108 GiB ok e5d4s07 2 DA1 106 GiB ok e5d4s08 2 DA2 110 GiB ok e5d4s09 2 DA3 110 GiB ok e5d4s10 2 DA1 106 GiB ok e5d4s11 2 DA2 110 GiB ok e5d4s12 2 DA3 110 GiB ok e5d5s07 2 DA1 106 GiB ok e5d5s08 2 DA2 110 GiB ok e5d5s09 2 DA3 110 GiB ok e5d5s10 2 DA1 106 GiB ok e5d5s11 2 DA2 110 GiB ok e6d1s07 2 DA1 106 GiB ok e6d1s08 2 DA2 110 GiB ok e6d1s09 2 DA3 110 GiB ok e6d1s10 2 DA1 106 GiB ok e6d1s11 2 DA2 110 GiB ok e6d1s12 2 DA3 110 GiB ok e6d2s07 2 DA1 106 GiB ok e6d2s08 2 DA2 110 GiB ok e6d2s09 2 DA3 110 GiB ok e6d2s10 2 DA1 106 GiB ok e6d2s11 2 DA2 110 GiB ok e6d2s12 2 DA3 110 GiB ok e6d3s07 2 DA1 106 GiB ok e6d3s08 2 DA2 108 GiB ok e6d3s09 2 DA3 110 GiB ok e6d3s10 2 DA1 106 GiB ok e6d3s11 2 DA2 108 GiB ok e6d3s12 2 DA3 108 GiB ok e6d4s07 2 DA1 106 GiB ok e6d4s08 2 DA2 108 GiB ok e6d4s09 2 DA3 108 GiB ok e6d4s10 2 DA1 106 GiB ok e6d4s11 2 DA2 108 GiB ok e6d4s12 2 DA3 108 GiB ok e6d5s07 2 DA1 106 GiB ok e6d5s08 2 DA2 110 GiB ok e6d5s09 2 DA3 108 GiB ok e6d5s10 2 DA1 106 GiB ok e6d5s11 2 DA2 108 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss01b_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss01b_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss01b_MetaData_8M_3p_1 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss01b_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss01b_MetaData_8M_3p_3 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss01b_Data_8M_3p_1 8+3p DA1 99 TiB 8 MiB 32 KiB gss01b_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss01b_Data_8M_3p_3 8+3p DA3 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss01b.ebi.ac.uk gss01b.ebi.ac.uk,gss01a.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss02a 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 41% low DA3 no 2 58 2 1 786 GiB 14 days scrub 8% low DA2 no 2 58 2 1 786 GiB 14 days scrub 14% low DA1 no 3 58 2 1 626 GiB 14 days scrub 5% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s01 2 DA3 110 GiB ok e1d1s02 2 DA2 110 GiB ok e1d1s03log 2 LOG 186 GiB ok e1d1s04 2 DA1 108 GiB ok e1d1s05 2 DA2 110 GiB ok e1d1s06 2 DA3 110 GiB ok e1d2s01 2 DA1 108 GiB ok e1d2s02 2 DA2 110 GiB ok e1d2s03 2 DA3 110 GiB ok e1d2s04 2 DA1 108 GiB ok e1d2s05 2 DA2 110 GiB ok e1d2s06 2 DA3 110 GiB ok e1d3s01 2 DA1 108 GiB ok e1d3s02 2 DA2 110 GiB ok e1d3s03 2 DA3 110 GiB ok e1d3s04 2 DA1 108 GiB ok e1d3s05 2 DA2 110 GiB ok e1d3s06 2 DA3 110 GiB ok e1d4s01 2 DA1 108 GiB ok e1d4s02 2 DA2 110 GiB ok e1d4s03 2 DA3 110 GiB ok e1d4s04 2 DA1 108 GiB ok e1d4s05 2 DA2 110 GiB ok e1d4s06 2 DA3 110 GiB ok e1d5s01 2 DA1 108 GiB ok e1d5s02 2 DA2 110 GiB ok e1d5s03 2 DA3 110 GiB ok e1d5s04 2 DA1 108 GiB ok e1d5s05 2 DA2 110 GiB ok e1d5s06 2 DA3 110 GiB ok e2d1s01 2 DA3 110 GiB ok e2d1s02 2 DA2 110 GiB ok e2d1s03log 2 LOG 186 GiB ok e2d1s04 2 DA1 108 GiB ok e2d1s05 2 DA2 110 GiB ok e2d1s06 2 DA3 110 GiB ok e2d2s01 2 DA1 106 GiB ok e2d2s02 2 DA2 110 GiB ok e2d2s03 2 DA3 110 GiB ok e2d2s04 2 DA1 106 GiB ok e2d2s05 2 DA2 110 GiB ok e2d2s06 2 DA3 110 GiB ok e2d3s01 2 DA1 106 GiB ok e2d3s02 2 DA2 110 GiB ok e2d3s03 2 DA3 110 GiB ok e2d3s04 2 DA1 106 GiB ok e2d3s05 2 DA2 110 GiB ok e2d3s06 2 DA3 110 GiB ok e2d4s01 2 DA1 106 GiB ok e2d4s02 2 DA2 110 GiB ok e2d4s03 2 DA3 110 GiB ok e2d4s04 2 DA1 106 GiB ok e2d4s05 2 DA2 110 GiB ok e2d4s06 2 DA3 110 GiB ok e2d5s01 2 DA1 108 GiB ok e2d5s02 2 DA2 110 GiB ok e2d5s03 2 DA3 110 GiB ok e2d5s04 2 DA1 108 GiB ok e2d5s05 2 DA2 110 GiB ok e2d5s06 2 DA3 110 GiB ok e3d1s01 2 DA1 108 GiB ok e3d1s02 2 DA3 110 GiB ok e3d1s03log 2 LOG 186 GiB ok e3d1s04 2 DA1 106 GiB ok e3d1s05 2 DA2 110 GiB ok e3d1s06 2 DA3 110 GiB ok e3d2s01 2 DA1 106 GiB ok e3d2s02 2 DA2 110 GiB ok e3d2s03 2 DA3 110 GiB ok e3d2s04 2 DA1 108 GiB ok e3d2s05 2 DA2 110 GiB ok e3d2s06 2 DA3 110 GiB ok e3d3s01 2 DA1 106 GiB ok e3d3s02 2 DA2 110 GiB ok e3d3s03 2 DA3 110 GiB ok e3d3s04 2 DA1 106 GiB ok e3d3s05 2 DA2 110 GiB ok e3d3s06 2 DA3 110 GiB ok e3d4s01 2 DA1 106 GiB ok e3d4s02 2 DA2 110 GiB ok e3d4s03 2 DA3 110 GiB ok e3d4s04 2 DA1 108 GiB ok e3d4s05 2 DA2 110 GiB ok e3d4s06 2 DA3 110 GiB ok e3d5s01 2 DA1 108 GiB ok e3d5s02 2 DA2 110 GiB ok e3d5s03 2 DA3 110 GiB ok e3d5s04 2 DA1 106 GiB ok e3d5s05 2 DA2 110 GiB ok e3d5s06 2 DA3 110 GiB ok e4d1s01 2 DA1 106 GiB ok e4d1s02 2 DA3 110 GiB ok e4d1s04 2 DA1 106 GiB ok e4d1s05 2 DA2 110 GiB ok e4d1s06 2 DA3 110 GiB ok e4d2s01 2 DA1 106 GiB ok e4d2s02 2 DA2 110 GiB ok e4d2s03 2 DA3 110 GiB ok e4d2s04 2 DA1 106 GiB ok e4d2s05 2 DA2 110 GiB ok e4d2s06 2 DA3 110 GiB ok e4d3s01 2 DA1 108 GiB ok e4d3s02 2 DA2 110 GiB ok e4d3s03 2 DA3 110 GiB ok e4d3s04 2 DA1 108 GiB ok e4d3s05 2 DA2 110 GiB ok e4d3s06 2 DA3 110 GiB ok e4d4s01 2 DA1 106 GiB ok e4d4s02 2 DA2 110 GiB ok e4d4s03 2 DA3 110 GiB ok e4d4s04 2 DA1 106 GiB ok e4d4s05 2 DA2 110 GiB ok e4d4s06 2 DA3 110 GiB ok e4d5s01 2 DA1 106 GiB ok e4d5s02 2 DA2 110 GiB ok e4d5s03 2 DA3 110 GiB ok e4d5s04 2 DA1 106 GiB ok e4d5s05 2 DA2 110 GiB ok e4d5s06 2 DA3 110 GiB ok e5d1s01 2 DA1 108 GiB ok e5d1s02 2 DA2 110 GiB ok e5d1s04 2 DA1 106 GiB ok e5d1s05 2 DA2 110 GiB ok e5d1s06 2 DA3 110 GiB ok e5d2s01 2 DA1 108 GiB ok e5d2s02 2 DA2 110 GiB ok e5d2s03 2 DA3 110 GiB ok e5d2s04 2 DA1 108 GiB ok e5d2s05 2 DA2 110 GiB ok e5d2s06 2 DA3 110 GiB ok e5d3s01 2 DA1 108 GiB ok e5d3s02 2 DA2 110 GiB ok e5d3s03 2 DA3 110 GiB ok e5d3s04 2 DA1 106 GiB ok e5d3s05 2 DA2 110 GiB ok e5d3s06 2 DA3 110 GiB ok e5d4s01 2 DA1 108 GiB ok e5d4s02 2 DA2 110 GiB ok e5d4s03 2 DA3 110 GiB ok e5d4s04 2 DA1 108 GiB ok e5d4s05 2 DA2 110 GiB ok e5d4s06 2 DA3 110 GiB ok e5d5s01 2 DA1 108 GiB ok e5d5s02 2 DA2 110 GiB ok e5d5s03 2 DA3 110 GiB ok e5d5s04 2 DA1 106 GiB ok e5d5s05 2 DA2 110 GiB ok e5d5s06 2 DA3 110 GiB ok e6d1s01 2 DA1 108 GiB ok e6d1s02 2 DA2 110 GiB ok e6d1s04 2 DA1 108 GiB ok e6d1s05 2 DA2 110 GiB ok e6d1s06 2 DA3 110 GiB ok e6d2s01 2 DA1 106 GiB ok e6d2s02 2 DA2 110 GiB ok e6d2s03 2 DA3 110 GiB ok e6d2s04 2 DA1 108 GiB ok e6d2s05 2 DA2 108 GiB ok e6d2s06 2 DA3 110 GiB ok e6d3s01 2 DA1 106 GiB ok e6d3s02 2 DA2 108 GiB ok e6d3s03 2 DA3 110 GiB ok e6d3s04 2 DA1 106 GiB ok e6d3s05 2 DA2 108 GiB ok e6d3s06 2 DA3 108 GiB ok e6d4s01 2 DA1 106 GiB ok e6d4s02 2 DA2 108 GiB ok e6d4s03 2 DA3 108 GiB ok e6d4s04 2 DA1 108 GiB ok e6d4s05 2 DA2 108 GiB ok e6d4s06 2 DA3 108 GiB ok e6d5s01 2 DA1 108 GiB ok e6d5s02 2 DA2 110 GiB ok e6d5s03 2 DA3 108 GiB ok e6d5s04 2 DA1 108 GiB ok e6d5s05 2 DA2 110 GiB ok e6d5s06 2 DA3 108 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss02a_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss02a_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss02a_MetaData_8M_3p_1 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss02a_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss02a_MetaData_8M_3p_3 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss02a_Data_8M_3p_1 8+3p DA3 99 TiB 8 MiB 32 KiB gss02a_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss02a_Data_8M_3p_3 8+3p DA1 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss02a.ebi.ac.uk gss02a.ebi.ac.uk,gss02b.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss02b 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 39% low DA1 no 3 58 2 1 626 GiB 14 days scrub 67% low DA2 no 2 58 2 1 786 GiB 14 days scrub 13% low DA3 no 2 58 2 1 786 GiB 14 days scrub 13% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s07 2 DA1 108 GiB ok e1d1s08 2 DA2 110 GiB ok e1d1s09 2 DA3 110 GiB ok e1d1s10 2 DA1 108 GiB ok e1d1s11 2 DA2 110 GiB ok e1d1s12 2 DA3 110 GiB ok e1d2s07 2 DA1 108 GiB ok e1d2s08 2 DA2 110 GiB ok e1d2s09 2 DA3 110 GiB ok e1d2s10 2 DA1 108 GiB ok e1d2s11 2 DA2 110 GiB ok e1d2s12 2 DA3 110 GiB ok e1d3s07 2 DA1 108 GiB ok e1d3s08 2 DA2 110 GiB ok e1d3s09 2 DA3 110 GiB ok e1d3s10 2 DA1 108 GiB ok e1d3s11 2 DA2 110 GiB ok e1d3s12 2 DA3 110 GiB ok e1d4s07 2 DA1 108 GiB ok e1d4s08 2 DA2 110 GiB ok e1d4s09 2 DA3 110 GiB ok e1d4s10 2 DA1 108 GiB ok e1d4s11 2 DA2 110 GiB ok e1d4s12 2 DA3 110 GiB ok e1d5s07 2 DA1 108 GiB ok e1d5s08 2 DA2 110 GiB ok e1d5s09 2 DA3 110 GiB ok e1d5s10 2 DA3 110 GiB ok e1d5s11 2 DA2 110 GiB ok e1d5s12log 2 LOG 186 GiB ok e2d1s07 2 DA1 108 GiB ok e2d1s08 2 DA2 110 GiB ok e2d1s09 2 DA3 110 GiB ok e2d1s10 2 DA1 108 GiB ok e2d1s11 2 DA2 110 GiB ok e2d1s12 2 DA3 110 GiB ok e2d2s07 2 DA1 108 GiB ok e2d2s08 2 DA2 110 GiB ok e2d2s09 2 DA3 110 GiB ok e2d2s10 2 DA1 108 GiB ok e2d2s11 2 DA2 110 GiB ok e2d2s12 2 DA3 110 GiB ok e2d3s07 2 DA1 108 GiB ok e2d3s08 2 DA2 110 GiB ok e2d3s09 2 DA3 110 GiB ok e2d3s10 2 DA1 108 GiB ok e2d3s11 2 DA2 110 GiB ok e2d3s12 2 DA3 110 GiB ok e2d4s07 2 DA1 108 GiB ok e2d4s08 2 DA2 110 GiB ok e2d4s09 2 DA3 110 GiB ok e2d4s10 2 DA1 108 GiB ok e2d4s11 2 DA2 110 GiB ok e2d4s12 2 DA3 110 GiB ok e2d5s07 2 DA1 108 GiB ok e2d5s08 2 DA2 110 GiB ok e2d5s09 2 DA3 110 GiB ok e2d5s10 2 DA3 110 GiB ok e2d5s11 2 DA2 110 GiB ok e2d5s12log 2 LOG 186 GiB ok e3d1s07 2 DA1 108 GiB ok e3d1s08 2 DA2 110 GiB ok e3d1s09 2 DA3 110 GiB ok e3d1s10 2 DA1 108 GiB ok e3d1s11 2 DA2 110 GiB ok e3d1s12 2 DA3 110 GiB ok e3d2s07 2 DA1 108 GiB ok e3d2s08 2 DA2 110 GiB ok e3d2s09 2 DA3 110 GiB ok e3d2s10 2 DA1 108 GiB ok e3d2s11 2 DA2 110 GiB ok e3d2s12 2 DA3 110 GiB ok e3d3s07 2 DA1 108 GiB ok e3d3s08 2 DA2 110 GiB ok e3d3s09 2 DA3 110 GiB ok e3d3s10 2 DA1 108 GiB ok e3d3s11 2 DA2 110 GiB ok e3d3s12 2 DA3 110 GiB ok e3d4s07 2 DA1 108 GiB ok e3d4s08 2 DA2 110 GiB ok e3d4s09 2 DA3 110 GiB ok e3d4s10 2 DA1 108 GiB ok e3d4s11 2 DA2 110 GiB ok e3d4s12 2 DA3 110 GiB ok e3d5s07 2 DA1 108 GiB ok e3d5s08 2 DA2 110 GiB ok e3d5s09 2 DA3 110 GiB ok e3d5s10 2 DA1 108 GiB ok e3d5s11 2 DA3 110 GiB ok e3d5s12log 2 LOG 186 GiB ok e4d1s07 2 DA1 108 GiB ok e4d1s08 2 DA2 110 GiB ok e4d1s09 2 DA3 110 GiB ok e4d1s10 2 DA1 108 GiB ok e4d1s11 2 DA2 110 GiB ok e4d1s12 2 DA3 110 GiB ok e4d2s07 2 DA1 106 GiB ok e4d2s08 2 DA2 110 GiB ok e4d2s09 2 DA3 110 GiB ok e4d2s10 2 DA1 106 GiB ok e4d2s11 2 DA2 110 GiB ok e4d2s12 2 DA3 110 GiB ok e4d3s07 2 DA1 106 GiB ok e4d3s08 2 DA2 110 GiB ok e4d3s09 2 DA3 110 GiB ok e4d3s10 2 DA1 106 GiB ok e4d3s11 2 DA2 110 GiB ok e4d3s12 2 DA3 110 GiB ok e4d4s07 2 DA1 106 GiB ok e4d4s08 2 DA2 110 GiB ok e4d4s09 2 DA3 110 GiB ok e4d4s10 2 DA1 108 GiB ok e4d4s11 2 DA2 110 GiB ok e4d4s12 2 DA3 110 GiB ok e4d5s07 2 DA1 106 GiB ok e4d5s08 2 DA2 110 GiB ok e4d5s09 2 DA3 110 GiB ok e4d5s10 2 DA1 106 GiB ok e4d5s11 2 DA3 110 GiB ok e5d1s07 2 DA1 106 GiB ok e5d1s08 2 DA2 110 GiB ok e5d1s09 2 DA3 110 GiB ok e5d1s10 2 DA1 106 GiB ok e5d1s11 2 DA2 110 GiB ok e5d1s12 2 DA3 110 GiB ok e5d2s07 2 DA1 106 GiB ok e5d2s08 2 DA2 110 GiB ok e5d2s09 2 DA3 110 GiB ok e5d2s10 2 DA1 106 GiB ok e5d2s11 2 DA2 110 GiB ok e5d2s12 2 DA3 110 GiB ok e5d3s07 2 DA1 106 GiB ok e5d3s08 2 DA2 110 GiB ok e5d3s09 2 DA3 110 GiB ok e5d3s10 2 DA1 106 GiB ok e5d3s11 2 DA2 110 GiB ok e5d3s12 2 DA3 110 GiB ok e5d4s07 2 DA1 106 GiB ok e5d4s08 2 DA2 110 GiB ok e5d4s09 2 DA3 110 GiB ok e5d4s10 2 DA1 106 GiB ok e5d4s11 2 DA2 110 GiB ok e5d4s12 2 DA3 110 GiB ok e5d5s07 2 DA1 106 GiB ok e5d5s08 2 DA2 110 GiB ok e5d5s09 2 DA3 110 GiB ok e5d5s10 2 DA1 106 GiB ok e5d5s11 2 DA2 110 GiB ok e6d1s07 2 DA1 106 GiB ok e6d1s08 2 DA2 110 GiB ok e6d1s09 2 DA3 110 GiB ok e6d1s10 2 DA1 106 GiB ok e6d1s11 2 DA2 110 GiB ok e6d1s12 2 DA3 110 GiB ok e6d2s07 2 DA1 106 GiB ok e6d2s08 2 DA2 110 GiB ok e6d2s09 2 DA3 108 GiB ok e6d2s10 2 DA1 106 GiB ok e6d2s11 2 DA2 108 GiB ok e6d2s12 2 DA3 108 GiB ok e6d3s07 2 DA1 106 GiB ok e6d3s08 2 DA2 108 GiB ok e6d3s09 2 DA3 108 GiB ok e6d3s10 2 DA1 106 GiB ok e6d3s11 2 DA2 108 GiB ok e6d3s12 2 DA3 108 GiB ok e6d4s07 2 DA1 106 GiB ok e6d4s08 2 DA2 108 GiB ok e6d4s09 2 DA3 108 GiB ok e6d4s10 2 DA1 106 GiB ok e6d4s11 2 DA2 108 GiB ok e6d4s12 2 DA3 110 GiB ok e6d5s07 2 DA1 106 GiB ok e6d5s08 2 DA2 110 GiB ok e6d5s09 2 DA3 110 GiB ok e6d5s10 2 DA1 106 GiB ok e6d5s11 2 DA2 110 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss02b_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss02b_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss02b_MetaData_8M_3p_1 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss02b_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss02b_MetaData_8M_3p_3 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss02b_Data_8M_3p_1 8+3p DA1 99 TiB 8 MiB 32 KiB gss02b_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss02b_Data_8M_3p_3 8+3p DA3 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss02b.ebi.ac.uk gss02b.ebi.ac.uk,gss02a.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss03a 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 36% low DA3 no 2 58 2 1 786 GiB 14 days scrub 18% low DA2 no 2 58 2 1 786 GiB 14 days scrub 19% low DA1 no 3 58 2 1 626 GiB 14 days scrub 4% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s01 2 DA3 110 GiB ok e1d1s02 2 DA2 110 GiB ok e1d1s03log 2 LOG 186 GiB ok e1d1s04 2 DA1 108 GiB ok e1d1s05 2 DA2 110 GiB ok e1d1s06 2 DA3 110 GiB ok e1d2s01 2 DA1 108 GiB ok e1d2s02 2 DA2 110 GiB ok e1d2s03 2 DA3 110 GiB ok e1d2s04 2 DA1 108 GiB ok e1d2s05 2 DA2 110 GiB ok e1d2s06 2 DA3 110 GiB ok e1d3s01 2 DA1 108 GiB ok e1d3s02 2 DA2 110 GiB ok e1d3s03 2 DA3 110 GiB ok e1d3s04 2 DA1 108 GiB ok e1d3s05 2 DA2 110 GiB ok e1d3s06 2 DA3 110 GiB ok e1d4s01 2 DA1 108 GiB ok e1d4s02 2 DA2 110 GiB ok e1d4s03 2 DA3 110 GiB ok e1d4s04 2 DA1 108 GiB ok e1d4s05 2 DA2 110 GiB ok e1d4s06 2 DA3 110 GiB ok e1d5s01 2 DA1 108 GiB ok e1d5s02 2 DA2 110 GiB ok e1d5s03 2 DA3 110 GiB ok e1d5s04 2 DA1 108 GiB ok e1d5s05 2 DA2 110 GiB ok e1d5s06 2 DA3 110 GiB ok e2d1s01 2 DA3 110 GiB ok e2d1s02 2 DA2 110 GiB ok e2d1s03log 2 LOG 186 GiB ok e2d1s04 2 DA1 108 GiB ok e2d1s05 2 DA2 110 GiB ok e2d1s06 2 DA3 110 GiB ok e2d2s01 2 DA1 108 GiB ok e2d2s02 2 DA2 110 GiB ok e2d2s03 2 DA3 110 GiB ok e2d2s04 2 DA1 108 GiB ok e2d2s05 2 DA2 110 GiB ok e2d2s06 2 DA3 110 GiB ok e2d3s01 2 DA1 108 GiB ok e2d3s02 2 DA2 110 GiB ok e2d3s03 2 DA3 110 GiB ok e2d3s04 2 DA1 108 GiB ok e2d3s05 2 DA2 110 GiB ok e2d3s06 2 DA3 110 GiB ok e2d4s01 2 DA1 108 GiB ok e2d4s02 2 DA2 110 GiB ok e2d4s03 2 DA3 110 GiB ok e2d4s04 2 DA1 108 GiB ok e2d4s05 2 DA2 110 GiB ok e2d4s06 2 DA3 110 GiB ok e2d5s01 2 DA1 108 GiB ok e2d5s02 2 DA2 110 GiB ok e2d5s03 2 DA3 110 GiB ok e2d5s04 2 DA1 108 GiB ok e2d5s05 2 DA2 110 GiB ok e2d5s06 2 DA3 110 GiB ok e3d1s01 2 DA1 108 GiB ok e3d1s02 2 DA3 110 GiB ok e3d1s03log 2 LOG 186 GiB ok e3d1s04 2 DA1 108 GiB ok e3d1s05 2 DA2 110 GiB ok e3d1s06 2 DA3 110 GiB ok e3d2s01 2 DA1 108 GiB ok e3d2s02 2 DA2 110 GiB ok e3d2s03 2 DA3 110 GiB ok e3d2s04 2 DA1 108 GiB ok e3d2s05 2 DA2 110 GiB ok e3d2s06 2 DA3 110 GiB ok e3d3s01 2 DA1 108 GiB ok e3d3s02 2 DA2 110 GiB ok e3d3s03 2 DA3 110 GiB ok e3d3s04 2 DA1 108 GiB ok e3d3s05 2 DA2 110 GiB ok e3d3s06 2 DA3 110 GiB ok e3d4s01 2 DA1 108 GiB ok e3d4s02 2 DA2 110 GiB ok e3d4s03 2 DA3 110 GiB ok e3d4s04 2 DA1 108 GiB ok e3d4s05 2 DA2 110 GiB ok e3d4s06 2 DA3 110 GiB ok e3d5s01 2 DA1 108 GiB ok e3d5s02 2 DA2 110 GiB ok e3d5s03 2 DA3 110 GiB ok e3d5s04 2 DA1 108 GiB ok e3d5s05 2 DA2 110 GiB ok e3d5s06 2 DA3 110 GiB ok e4d1s01 2 DA1 108 GiB ok e4d1s02 2 DA3 110 GiB ok e4d1s04 2 DA1 108 GiB ok e4d1s05 2 DA2 110 GiB ok e4d1s06 2 DA3 110 GiB ok e4d2s01 2 DA1 108 GiB ok e4d2s02 2 DA2 110 GiB ok e4d2s03 2 DA3 110 GiB ok e4d2s04 2 DA1 106 GiB ok e4d2s05 2 DA2 110 GiB ok e4d2s06 2 DA3 110 GiB ok e4d3s01 2 DA1 106 GiB ok e4d3s02 2 DA2 110 GiB ok e4d3s03 2 DA3 110 GiB ok e4d3s04 2 DA1 106 GiB ok e4d3s05 2 DA2 110 GiB ok e4d3s06 2 DA3 110 GiB ok e4d4s01 2 DA1 106 GiB ok e4d4s02 2 DA2 110 GiB ok e4d4s03 2 DA3 110 GiB ok e4d4s04 2 DA1 106 GiB ok e4d4s05 2 DA2 110 GiB ok e4d4s06 2 DA3 110 GiB ok e4d5s01 2 DA1 106 GiB ok e4d5s02 2 DA2 110 GiB ok e4d5s03 2 DA3 110 GiB ok e4d5s04 2 DA1 106 GiB ok e4d5s05 2 DA2 110 GiB ok e4d5s06 2 DA3 110 GiB ok e5d1s01 2 DA1 106 GiB ok e5d1s02 2 DA2 110 GiB ok e5d1s04 2 DA1 106 GiB ok e5d1s05 2 DA2 110 GiB ok e5d1s06 2 DA3 110 GiB ok e5d2s01 2 DA1 106 GiB ok e5d2s02 2 DA2 110 GiB ok e5d2s03 2 DA3 110 GiB ok e5d2s04 2 DA1 106 GiB ok e5d2s05 2 DA2 110 GiB ok e5d2s06 2 DA3 110 GiB ok e5d3s01 2 DA1 106 GiB ok e5d3s02 2 DA2 110 GiB ok e5d3s03 2 DA3 110 GiB ok e5d3s04 2 DA1 106 GiB ok e5d3s05 2 DA2 110 GiB ok e5d3s06 2 DA3 110 GiB ok e5d4s01 2 DA1 106 GiB ok e5d4s02 2 DA2 110 GiB ok e5d4s03 2 DA3 110 GiB ok e5d4s04 2 DA1 106 GiB ok e5d4s05 2 DA2 110 GiB ok e5d4s06 2 DA3 110 GiB ok e5d5s01 2 DA1 106 GiB ok e5d5s02 2 DA2 110 GiB ok e5d5s03 2 DA3 110 GiB ok e5d5s04 2 DA1 106 GiB ok e5d5s05 2 DA2 110 GiB ok e5d5s06 2 DA3 110 GiB ok e6d1s01 2 DA1 106 GiB ok e6d1s02 2 DA2 110 GiB ok e6d1s04 2 DA1 106 GiB ok e6d1s05 2 DA2 110 GiB ok e6d1s06 2 DA3 110 GiB ok e6d2s01 2 DA1 106 GiB ok e6d2s02 2 DA2 110 GiB ok e6d2s03 2 DA3 110 GiB ok e6d2s04 2 DA1 106 GiB ok e6d2s05 2 DA2 108 GiB ok e6d2s06 2 DA3 108 GiB ok e6d3s01 2 DA1 106 GiB ok e6d3s02 2 DA2 108 GiB ok e6d3s03 2 DA3 108 GiB ok e6d3s04 2 DA1 106 GiB ok e6d3s05 2 DA2 108 GiB ok e6d3s06 2 DA3 108 GiB ok e6d4s01 2 DA1 106 GiB ok e6d4s02 2 DA2 108 GiB ok e6d4s03 2 DA3 108 GiB ok e6d4s04 2 DA1 106 GiB ok e6d4s05 2 DA2 108 GiB ok e6d4s06 2 DA3 108 GiB ok e6d5s01 2 DA1 106 GiB ok e6d5s02 2 DA2 110 GiB ok e6d5s03 2 DA3 110 GiB ok e6d5s04 2 DA1 106 GiB ok e6d5s05 2 DA2 110 GiB ok e6d5s06 2 DA3 110 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss03a_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss03a_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss03a_MetaData_8M_3p_1 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss03a_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss03a_MetaData_8M_3p_3 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss03a_Data_8M_3p_1 8+3p DA3 99 TiB 8 MiB 32 KiB gss03a_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss03a_Data_8M_3p_3 8+3p DA1 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss03a.ebi.ac.uk gss03a.ebi.ac.uk,gss03b.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss03b 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 38% low DA1 no 3 58 2 1 626 GiB 14 days scrub 12% low DA2 no 2 58 2 1 786 GiB 14 days scrub 20% low DA3 no 2 58 2 1 786 GiB 14 days scrub 19% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s07 2 DA1 108 GiB ok e1d1s08 2 DA2 110 GiB ok e1d1s09 2 DA3 110 GiB ok e1d1s10 2 DA1 108 GiB ok e1d1s11 2 DA2 110 GiB ok e1d1s12 2 DA3 110 GiB ok e1d2s07 2 DA1 108 GiB ok e1d2s08 2 DA2 110 GiB ok e1d2s09 2 DA3 110 GiB ok e1d2s10 2 DA1 108 GiB ok e1d2s11 2 DA2 110 GiB ok e1d2s12 2 DA3 110 GiB ok e1d3s07 2 DA1 108 GiB ok e1d3s08 2 DA2 110 GiB ok e1d3s09 2 DA3 110 GiB ok e1d3s10 2 DA1 108 GiB ok e1d3s11 2 DA2 110 GiB ok e1d3s12 2 DA3 110 GiB ok e1d4s07 2 DA1 108 GiB ok e1d4s08 2 DA2 110 GiB ok e1d4s09 2 DA3 110 GiB ok e1d4s10 2 DA1 108 GiB ok e1d4s11 2 DA2 110 GiB ok e1d4s12 2 DA3 110 GiB ok e1d5s07 2 DA1 108 GiB ok e1d5s08 2 DA2 110 GiB ok e1d5s09 2 DA3 110 GiB ok e1d5s10 2 DA3 110 GiB ok e1d5s11 2 DA2 110 GiB ok e1d5s12log 2 LOG 186 GiB ok e2d1s07 2 DA1 108 GiB ok e2d1s08 2 DA2 110 GiB ok e2d1s09 2 DA3 110 GiB ok e2d1s10 2 DA1 106 GiB ok e2d1s11 2 DA2 110 GiB ok e2d1s12 2 DA3 110 GiB ok e2d2s07 2 DA1 106 GiB ok e2d2s08 2 DA2 110 GiB ok e2d2s09 2 DA3 110 GiB ok e2d2s10 2 DA1 106 GiB ok e2d2s11 2 DA2 110 GiB ok e2d2s12 2 DA3 110 GiB ok e2d3s07 2 DA1 106 GiB ok e2d3s08 2 DA2 110 GiB ok e2d3s09 2 DA3 110 GiB ok e2d3s10 2 DA1 106 GiB ok e2d3s11 2 DA2 110 GiB ok e2d3s12 2 DA3 110 GiB ok e2d4s07 2 DA1 106 GiB ok e2d4s08 2 DA2 110 GiB ok e2d4s09 2 DA3 110 GiB ok e2d4s10 2 DA1 108 GiB ok e2d4s11 2 DA2 110 GiB ok e2d4s12 2 DA3 110 GiB ok e2d5s07 2 DA1 108 GiB ok e2d5s08 2 DA2 110 GiB ok e2d5s09 2 DA3 110 GiB ok e2d5s10 2 DA3 110 GiB ok e2d5s11 2 DA2 110 GiB ok e2d5s12log 2 LOG 186 GiB ok e3d1s07 2 DA1 108 GiB ok e3d1s08 2 DA2 110 GiB ok e3d1s09 2 DA3 110 GiB ok e3d1s10 2 DA1 106 GiB ok e3d1s11 2 DA2 110 GiB ok e3d1s12 2 DA3 110 GiB ok e3d2s07 2 DA1 106 GiB ok e3d2s08 2 DA2 110 GiB ok e3d2s09 2 DA3 110 GiB ok e3d2s10 2 DA1 108 GiB ok e3d2s11 2 DA2 110 GiB ok e3d2s12 2 DA3 110 GiB ok e3d3s07 2 DA1 106 GiB ok e3d3s08 2 DA2 110 GiB ok e3d3s09 2 DA3 110 GiB ok e3d3s10 2 DA1 106 GiB ok e3d3s11 2 DA2 110 GiB ok e3d3s12 2 DA3 110 GiB ok e3d4s07 2 DA1 106 GiB ok e3d4s08 2 DA2 110 GiB ok e3d4s09 2 DA3 110 GiB ok e3d4s10 2 DA1 108 GiB ok e3d4s11 2 DA2 110 GiB ok e3d4s12 2 DA3 110 GiB ok e3d5s07 2 DA1 108 GiB ok e3d5s08 2 DA2 110 GiB ok e3d5s09 2 DA3 110 GiB ok e3d5s10 2 DA1 106 GiB ok e3d5s11 2 DA3 110 GiB ok e3d5s12log 2 LOG 186 GiB ok e4d1s07 2 DA1 106 GiB ok e4d1s08 2 DA2 110 GiB ok e4d1s09 2 DA3 110 GiB ok e4d1s10 2 DA1 106 GiB ok e4d1s11 2 DA2 110 GiB ok e4d1s12 2 DA3 110 GiB ok e4d2s07 2 DA1 106 GiB ok e4d2s08 2 DA2 110 GiB ok e4d2s09 2 DA3 110 GiB ok e4d2s10 2 DA1 106 GiB ok e4d2s11 2 DA2 110 GiB ok e4d2s12 2 DA3 110 GiB ok e4d3s07 2 DA1 108 GiB ok e4d3s08 2 DA2 110 GiB ok e4d3s09 2 DA3 110 GiB ok e4d3s10 2 DA1 108 GiB ok e4d3s11 2 DA2 110 GiB ok e4d3s12 2 DA3 110 GiB ok e4d4s07 2 DA1 106 GiB ok e4d4s08 2 DA2 110 GiB ok e4d4s09 2 DA3 110 GiB ok e4d4s10 2 DA1 106 GiB ok e4d4s11 2 DA2 110 GiB ok e4d4s12 2 DA3 110 GiB ok e4d5s07 2 DA1 106 GiB ok e4d5s08 2 DA2 110 GiB ok e4d5s09 2 DA3 110 GiB ok e4d5s10 2 DA1 106 GiB ok e4d5s11 2 DA3 110 GiB ok e5d1s07 2 DA1 108 GiB ok e5d1s08 2 DA2 110 GiB ok e5d1s09 2 DA3 110 GiB ok e5d1s10 2 DA1 106 GiB ok e5d1s11 2 DA2 110 GiB ok e5d1s12 2 DA3 110 GiB ok e5d2s07 2 DA1 108 GiB ok e5d2s08 2 DA2 110 GiB ok e5d2s09 2 DA3 110 GiB ok e5d2s10 2 DA1 108 GiB ok e5d2s11 2 DA2 110 GiB ok e5d2s12 2 DA3 110 GiB ok e5d3s07 2 DA1 108 GiB ok e5d3s08 2 DA2 110 GiB ok e5d3s09 2 DA3 110 GiB ok e5d3s10 2 DA1 106 GiB ok e5d3s11 2 DA2 110 GiB ok e5d3s12 2 DA3 110 GiB ok e5d4s07 2 DA1 108 GiB ok e5d4s08 2 DA2 110 GiB ok e5d4s09 2 DA3 110 GiB ok e5d4s10 2 DA1 108 GiB ok e5d4s11 2 DA2 110 GiB ok e5d4s12 2 DA3 110 GiB ok e5d5s07 2 DA1 108 GiB ok e5d5s08 2 DA2 110 GiB ok e5d5s09 2 DA3 110 GiB ok e5d5s10 2 DA1 106 GiB ok e5d5s11 2 DA2 110 GiB ok e6d1s07 2 DA1 108 GiB ok e6d1s08 2 DA2 110 GiB ok e6d1s09 2 DA3 110 GiB ok e6d1s10 2 DA1 108 GiB ok e6d1s11 2 DA2 110 GiB ok e6d1s12 2 DA3 110 GiB ok e6d2s07 2 DA1 106 GiB ok e6d2s08 2 DA2 110 GiB ok e6d2s09 2 DA3 108 GiB ok e6d2s10 2 DA1 108 GiB ok e6d2s11 2 DA2 108 GiB ok e6d2s12 2 DA3 108 GiB ok e6d3s07 2 DA1 106 GiB ok e6d3s08 2 DA2 108 GiB ok e6d3s09 2 DA3 108 GiB ok e6d3s10 2 DA1 106 GiB ok e6d3s11 2 DA2 108 GiB ok e6d3s12 2 DA3 108 GiB ok e6d4s07 2 DA1 106 GiB ok e6d4s08 2 DA2 108 GiB ok e6d4s09 2 DA3 108 GiB ok e6d4s10 2 DA1 108 GiB ok e6d4s11 2 DA2 108 GiB ok e6d4s12 2 DA3 110 GiB ok e6d5s07 2 DA1 108 GiB ok e6d5s08 2 DA2 110 GiB ok e6d5s09 2 DA3 110 GiB ok e6d5s10 2 DA1 108 GiB ok e6d5s11 2 DA2 110 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss03b_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss03b_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss03b_MetaData_8M_3p_1 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss03b_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss03b_MetaData_8M_3p_3 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss03b_Data_8M_3p_1 8+3p DA1 99 TiB 8 MiB 32 KiB gss03b_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss03b_Data_8M_3p_3 8+3p DA3 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss03b.ebi.ac.uk gss03b.ebi.ac.uk,gss03a.ebi.ac.uk -------------- next part -------------- === mmdiag: config === allowDeleteAclOnChmod 1 assertOnStructureError 0 atimeDeferredSeconds 86400 ! cipherList AUTHONLY ! clusterId 17987981184946329605 ! clusterName GSS.ebi.ac.uk consoleLogEvents 0 dataStructureDump 1 /tmp/mmfs dataStructureDumpOnRGOpenFailed 0 /tmp/mmfs dataStructureDumpOnSGPanic 0 /tmp/mmfs dataStructureDumpWait 60 dbBlockSizeThreshold -1 distributedTokenServer 1 dmapiAllowMountOnWindows 1 dmapiDataEventRetry 2 dmapiEnable 1 dmapiEventBuffers 64 dmapiEventTimeout -1 ! dmapiFileHandleSize 32 dmapiMountEvent all dmapiMountTimeout 60 dmapiSessionFailureTimeout 0 dmapiWorkerThreads 12 enableIPv6 0 enableLowspaceEvents 0 enableNFSCluster 0 enableStatUIDremap 0 enableTreeBasedQuotas 0 enableUIDremap 0 encryptionCryptoEngineLibName (NULL) encryptionCryptoEngineType CLiC enforceFilesetQuotaOnRoot 0 envVar ! failureDetectionTime 60 fgdlActivityTimeWindow 10 fgdlLeaveThreshold 1000 fineGrainDirLocks 1 FIPS1402mode 0 FleaDisableIntegrityChecks 0 FleaNumAsyncIOThreads 2 FleaNumLEBBuffers 256 FleaPreferredStripSize 0 ! flushedDataTarget 1024 ! flushedInodeTarget 1024 healthCheckInterval 10 idleSocketTimeout 3600 ignorePrefetchLUNCount 0 ignoreReplicaSpaceOnStat 0 ignoreReplicationForQuota 0 ignoreReplicationOnStatfs 0 ! ioHistorySize 65536 iscanPrefetchAggressiveness 2 leaseDMSTimeout -1 leaseDuration -1 leaseRecoveryWait 35 ! logBufferCount 20 ! logWrapAmountPct 2 ! logWrapThreads 128 lrocChecksum 0 lrocData 1 lrocDataMaxBufferSize 32768 lrocDataMaxFileSize 32768 lrocDataStubFileSize 0 lrocDeviceMaxSectorsKB 64 lrocDeviceNrRequests 1024 lrocDeviceQueueDepth 31 lrocDevices lrocDeviceScheduler deadline lrocDeviceSetParams 1 lrocDirectories 1 lrocInodes 1 ! maxAllocRegionsPerNode 32 ! maxBackgroundDeletionThreads 16 ! maxblocksize 16777216 ! maxBufferCleaners 1024 ! maxBufferDescs 2097152 maxDiskAddrBuffs -1 maxFcntlRangesPerFile 200 ! maxFileCleaners 1024 maxFileNameBytes 255 ! maxFilesToCache 12288 ! maxGeneralThreads 1280 ! maxInodeDeallocPrefetch 128 ! maxMBpS 16000 maxMissedPingTimeout 60 ! maxReceiverThreads 128 ! maxStatCache 512 maxTokenServers 128 minMissedPingTimeout 3 minQuorumNodes 1 ! minReleaseLevel 1340 ! myNodeConfigNumber 5 noSpaceEventInterval 120 nsdBufSpace (% of PagePool) 30 ! nsdClientCksumTypeLocal NsdCksum_Ck64 ! nsdClientCksumTypeRemote NsdCksum_Ck64 nsdDumpBuffersOnCksumError 0 nsd_cksum_capture ! nsdInlineWriteMax 32768 ! nsdMaxWorkerThreads 3072 ! nsdMinWorkerThreads 3072 nsdMultiQueue 256 nsdRAIDAllowTraditionalNSD 0 nsdRAIDAULogColocationLimit 131072 nsdRAIDBackgroundMinPct 5 ! nsdRAIDBlockDeviceMaxSectorsKB 4096 ! nsdRAIDBlockDeviceNrRequests 32 ! nsdRAIDBlockDeviceQueueDepth 16 ! nsdRAIDBlockDeviceScheduler deadline ! nsdRAIDBufferPoolSizePct (% of PagePool) 80 nsdRAIDBuffersPromotionThresholdPct 50 nsdRAIDCreateVdiskThreads 8 nsdRAIDDiskDiscoveryInterval 180 ! nsdRAIDEventLogToConsole all ! nsdRAIDFastWriteFSDataLimit 65536 ! nsdRAIDFastWriteFSMetadataLimit 262144 ! nsdRAIDFlusherBuffersLimitPct 80 ! nsdRAIDFlusherBuffersLowWatermarkPct 20 ! nsdRAIDFlusherFWLogHighWatermarkMB 1000 ! nsdRAIDFlusherFWLogLimitMB 5000 ! nsdRAIDFlusherThreadsHighWatermark 512 ! nsdRAIDFlusherThreadsLowWatermark 1 ! nsdRAIDFlusherTracksLimitPct 80 ! nsdRAIDFlusherTracksLowWatermarkPct 20 nsdRAIDForegroundMinPct 15 ! nsdRAIDMaxTransientStale2FT 1 ! nsdRAIDMaxTransientStale3FT 1 nsdRAIDMediumWriteLimitPct 50 nsdRAIDMultiQueue -1 ! nsdRAIDReconstructAggressiveness 1 ! nsdRAIDSmallBufferSize 262144 ! nsdRAIDSmallThreadRatio 2 ! nsdRAIDThreadsPerQueue 16 ! nsdRAIDTracks 131072 ! numaMemoryInterleave yes opensslLibName /usr/lib64/libssl.so.10:/usr/lib64/libssl.so.6:/usr/lib64/libssl.so.0.9.8:/lib64/libssl.so.6:libssl.so:libssl.so.0:libssl.so.4 ! pagepool 40802189312 pagepoolMaxPhysMemPct 75 prefetchAggressiveness 2 prefetchAggressivenessRead -1 prefetchAggressivenessWrite -1 ! prefetchPct 5 prefetchThreads 72 readReplicaPolicy default remoteMountTimeout 10 sharedMemLimit 0 sharedMemReservePct 15 sidAutoMapRangeLength 15000000 sidAutoMapRangeStart 15000000 ! socketMaxListenConnections 1500 socketRcvBufferSize 0 socketSndBufferSize 0 statCacheDirPct 10 subnets ! syncWorkerThreads 256 tiebreaker system tiebreakerDisks tokenMemLimit 536870912 treatOSyncLikeODSync 1 tscTcpPort 1191 ! tscWorkerPool 64 uidDomain GSS.ebi.ac.uk uidExpiration 36000 unmountOnDiskFail no useDIOXW 1 usePersistentReserve 0 verbsLibName libibverbs.so verbsPorts verbsRdma disable verbsRdmaCm disable verbsRdmaCmLibName librdmacm.so verbsRdmaMaxSendBytes 16777216 verbsRdmaMinBytes 8192 verbsRdmaQpRtrMinRnrTimer 18 verbsRdmaQpRtrPathMtu 2048 verbsRdmaQpRtrSl 0 verbsRdmaQpRtrSlDynamic 0 verbsRdmaQpRtrSlDynamicTimeout 10 verbsRdmaQpRtsRetryCnt 6 verbsRdmaQpRtsRnrRetry 6 verbsRdmaQpRtsTimeout 18 verbsRdmaSend 0 verbsRdmasPerConnection 8 verbsRdmasPerNode 0 verbsRdmaTimeout 18 verifyGpfsReady 0 ! worker1Threads 1024 ! worker3Threads 32 writebehindThreshold 524288 From oehmes at us.ibm.com Tue Oct 14 18:23:50 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 10:23:50 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: <543D51B6.3070602@ebi.ac.uk> References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> <543D51B6.3070602@ebi.ac.uk> Message-ID: you basically run GSS 1.0 code , while in the current version is GSS 2.0 (which replaced Version 1.5 2 month ago) GSS 1.5 and 2.0 have several enhancements in this space so i strongly encourage you to upgrade your systems. if you can specify a bit what your workload is there might also be additional knobs we can turn to change the behavior. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM: > From: Salvatore Di Nardo > To: gpfsug main discussion list > Date: 10/14/2014 09:40 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > > Thanks in advance for your help. > > We have 6 RG: > recovery group vdisks vdisks servers > ------------------ ----------- ------ ------- > gss01a 4 8 gss01a.ebi.ac.uk,gss01b.ebi.ac.uk > gss01b 4 8 gss01b.ebi.ac.uk,gss01a.ebi.ac.uk > gss02a 4 8 gss02a.ebi.ac.uk,gss02b.ebi.ac.uk > gss02b 4 8 gss02b.ebi.ac.uk,gss02a.ebi.ac.uk > gss03a 4 8 gss03a.ebi.ac.uk,gss03b.ebi.ac.uk > gss03b 4 8 gss03b.ebi.ac.uk,gss03a.ebi.ac.uk > > Check the attached file for RG details. > Following mmlsconfig: > [root at gss01a ~]# mmlsconfig > Configuration data for cluster GSS.ebi.ac.uk: > --------------------------------------------- > myNodeConfigNumber 1 > clusterName GSS.ebi.ac.uk > clusterId 17987981184946329605 > autoload no > dmapiFileHandleSize 32 > minReleaseLevel 3.5.0.11 > [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] > pagepool 38g > nsdRAIDBufferPoolSizePct 80 > maxBufferDescs 2m > numaMemoryInterleave yes > prefetchPct 5 > maxblocksize 16m > nsdRAIDTracks 128k > ioHistorySize 64k > nsdRAIDSmallBufferSize 256k > nsdMaxWorkerThreads 3k > nsdMinWorkerThreads 3k > nsdRAIDSmallThreadRatio 2 > nsdRAIDThreadsPerQueue 16 > nsdClientCksumTypeLocal ck64 > nsdClientCksumTypeRemote ck64 > nsdRAIDEventLogToConsole all > nsdRAIDFastWriteFSDataLimit 64k > nsdRAIDFastWriteFSMetadataLimit 256k > nsdRAIDReconstructAggressiveness 1 > nsdRAIDFlusherBuffersLowWatermarkPct 20 > nsdRAIDFlusherBuffersLimitPct 80 > nsdRAIDFlusherTracksLowWatermarkPct 20 > nsdRAIDFlusherTracksLimitPct 80 > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > nsdRAIDFlusherFWLogLimitMB 5000 > nsdRAIDFlusherThreadsLowWatermark 1 > nsdRAIDFlusherThreadsHighWatermark 512 > nsdRAIDBlockDeviceMaxSectorsKB 4096 > nsdRAIDBlockDeviceNrRequests 32 > nsdRAIDBlockDeviceQueueDepth 16 > nsdRAIDBlockDeviceScheduler deadline > nsdRAIDMaxTransientStale2FT 1 > nsdRAIDMaxTransientStale3FT 1 > syncWorkerThreads 256 > tscWorkerPool 64 > nsdInlineWriteMax 32k > maxFilesToCache 12k > maxStatCache 512 > maxGeneralThreads 1280 > flushedDataTarget 1024 > flushedInodeTarget 1024 > maxFileCleaners 1024 > maxBufferCleaners 1024 > logBufferCount 20 > logWrapAmountPct 2 > logWrapThreads 128 > maxAllocRegionsPerNode 32 > maxBackgroundDeletionThreads 16 > maxInodeDeallocPrefetch 128 > maxMBpS 16000 > maxReceiverThreads 128 > worker1Threads 1024 > worker3Threads 32 > [common] > cipherList AUTHONLY > socketMaxListenConnections 1500 > failureDetectionTime 60 > [common] > adminMode central > > File systems in cluster GSS.ebi.ac.uk: > -------------------------------------- > /dev/gpfs1 > For more configuration paramenters i also attached a file with the > complete output of mmdiag --config. > > > and mmlsfs: > > File system attributes for /dev/gpfs1: > ====================================== > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 32768 Minimum fragment size > in bytes (system pool) > 262144 Minimum fragment size > in bytes (other pools) > -i 512 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 2 Default number of > metadata replicas > -M 2 Maximum number of > metadata replicas > -r 1 Default number of data replicas > -R 2 Maximum number of data replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in effect > -k all ACL semantics in effect > -n 1000 Estimated number of > nodes that will mount file system > -B 1048576 Block size (system pool) > 8388608 Block size (other pools) > -Q user;group;fileset Quotas enforced > user;group;fileset Default quotas enabled > --filesetdf no Fileset df enabled? > -V 13.23 (3.5.0.7) File system version > --create-time Tue Mar 18 16:01:24 2014 File system creation time > -u yes Support for large LUNs? > -z no Is DMAPI enabled? > -L 4194304 Logfile size > -E yes Exact mtime mount option > -S yes Suppress atime mount option > -K whenpossible Strict replica allocation option > --fastea yes Fast external attributes enabled? > --inode-limit 134217728 Maximum number of inodes > -P system;data Disk storage pools in file system > -d > gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; > -d > gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; > -d > gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; > -d > gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; > -d > gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 > Disks in file system > --perfileset-quota no Per-fileset quota enforcement > -A yes Automatic mount option > -o none Additional mount options > -T /gpfs1 Default mount point > --mount-priority 0 Mount priority > > > Regards, > Salvatore > > On 14/10/14 17:22, Sven Oehme wrote: > your GSS code version is very backlevel. > > can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk > as well as mmlsconfig and mmlsfs all > > thx. Sven > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug-discuss at gpfsug.org > Date: 10/14/2014 08:23 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > > > > > On 14/10/14 15:51, Sven Oehme wrote: > it means there is contention on inserting data into the fast write > log on the GSS Node, which could be config or workload related > what GSS code version are you running > [root at ebi5-251 ~]# mmdiag --version > > === mmdiag: version === > Current GPFS build: "3.5.0-11 efix1 (888041)". > Built on Jul 9 2013 at 18:03:32 > Running 6 days 2 hours 10 minutes 35 secs > > > > and how are the nodes connected with each other (Ethernet or IB) ? > ethernet. they use the same bonding (4x10Gb/s) where the data is > passing. We don't have admin dedicated network > > [root at gss03a ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager > > > Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different > subnet because of datacenter constraints ( They are not physically > in the same row, and due to network constraints was not possible to > put them in the same subnet). The packets are routed, but should not > be a problem as there is 160Gb/s bandwidth between them. > > Regards, > Salvatore > > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug main discussion list > Date: 10/14/2014 07:40 AM > Subject: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > > > > hello all, > could someone explain me the meaning of those waiters? > > gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > > Does it means that the vdisk logs are struggling? > > Regards, > Salvatore > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/ > IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Tue Oct 14 18:32:50 2014 From: zgiles at gmail.com (Zachary Giles) Date: Tue, 14 Oct 2014 13:32:50 -0400 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> <543D51B6.3070602@ebi.ac.uk> Message-ID: Except that AFAIK no one has published how to update GSS or where the update code is.. All I've heard is "contact your sales rep". Any pointers? On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme wrote: > you basically run GSS 1.0 code , while in the current version is GSS 2.0 > (which replaced Version 1.5 2 month ago) > > GSS 1.5 and 2.0 have several enhancements in this space so i strongly > encourage you to upgrade your systems. > > if you can specify a bit what your workload is there might also be > additional knobs we can turn to change the behavior. > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM: > >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 09:40 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> Thanks in advance for your help. >> >> We have 6 RG: > >> recovery group vdisks vdisks servers >> ------------------ ----------- ------ ------- >> gss01a 4 8 >> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk >> gss01b 4 8 >> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk >> gss02a 4 8 >> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk >> gss02b 4 8 >> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk >> gss03a 4 8 >> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk >> gss03b 4 8 >> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk >> >> Check the attached file for RG details. >> Following mmlsconfig: > >> [root at gss01a ~]# mmlsconfig >> Configuration data for cluster GSS.ebi.ac.uk: >> --------------------------------------------- >> myNodeConfigNumber 1 >> clusterName GSS.ebi.ac.uk >> clusterId 17987981184946329605 >> autoload no >> dmapiFileHandleSize 32 >> minReleaseLevel 3.5.0.11 >> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] >> pagepool 38g >> nsdRAIDBufferPoolSizePct 80 >> maxBufferDescs 2m >> numaMemoryInterleave yes >> prefetchPct 5 >> maxblocksize 16m >> nsdRAIDTracks 128k >> ioHistorySize 64k >> nsdRAIDSmallBufferSize 256k >> nsdMaxWorkerThreads 3k >> nsdMinWorkerThreads 3k >> nsdRAIDSmallThreadRatio 2 >> nsdRAIDThreadsPerQueue 16 >> nsdClientCksumTypeLocal ck64 >> nsdClientCksumTypeRemote ck64 >> nsdRAIDEventLogToConsole all >> nsdRAIDFastWriteFSDataLimit 64k >> nsdRAIDFastWriteFSMetadataLimit 256k >> nsdRAIDReconstructAggressiveness 1 >> nsdRAIDFlusherBuffersLowWatermarkPct 20 >> nsdRAIDFlusherBuffersLimitPct 80 >> nsdRAIDFlusherTracksLowWatermarkPct 20 >> nsdRAIDFlusherTracksLimitPct 80 >> nsdRAIDFlusherFWLogHighWatermarkMB 1000 >> nsdRAIDFlusherFWLogLimitMB 5000 >> nsdRAIDFlusherThreadsLowWatermark 1 >> nsdRAIDFlusherThreadsHighWatermark 512 >> nsdRAIDBlockDeviceMaxSectorsKB 4096 >> nsdRAIDBlockDeviceNrRequests 32 >> nsdRAIDBlockDeviceQueueDepth 16 >> nsdRAIDBlockDeviceScheduler deadline >> nsdRAIDMaxTransientStale2FT 1 >> nsdRAIDMaxTransientStale3FT 1 >> syncWorkerThreads 256 >> tscWorkerPool 64 >> nsdInlineWriteMax 32k >> maxFilesToCache 12k >> maxStatCache 512 >> maxGeneralThreads 1280 >> flushedDataTarget 1024 >> flushedInodeTarget 1024 >> maxFileCleaners 1024 >> maxBufferCleaners 1024 >> logBufferCount 20 >> logWrapAmountPct 2 >> logWrapThreads 128 >> maxAllocRegionsPerNode 32 >> maxBackgroundDeletionThreads 16 >> maxInodeDeallocPrefetch 128 >> maxMBpS 16000 >> maxReceiverThreads 128 >> worker1Threads 1024 >> worker3Threads 32 >> [common] >> cipherList AUTHONLY >> socketMaxListenConnections 1500 >> failureDetectionTime 60 >> [common] >> adminMode central >> >> File systems in cluster GSS.ebi.ac.uk: >> -------------------------------------- >> /dev/gpfs1 > >> For more configuration paramenters i also attached a file with the >> complete output of mmdiag --config. >> >> >> and mmlsfs: >> >> File system attributes for /dev/gpfs1: >> ====================================== >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 32768 Minimum fragment size >> in bytes (system pool) >> 262144 Minimum fragment size >> in bytes (other pools) >> -i 512 Inode size in bytes >> -I 32768 Indirect block size in bytes >> -m 2 Default number of >> metadata replicas >> -M 2 Maximum number of >> metadata replicas >> -r 1 Default number of data >> replicas >> -R 2 Maximum number of data >> replicas >> -j scatter Block allocation type >> -D nfs4 File locking semantics in >> effect >> -k all ACL semantics in effect >> -n 1000 Estimated number of >> nodes that will mount file system >> -B 1048576 Block size (system pool) >> 8388608 Block size (other pools) >> -Q user;group;fileset Quotas enforced >> user;group;fileset Default quotas enabled >> --filesetdf no Fileset df enabled? >> -V 13.23 (3.5.0.7) File system version >> --create-time Tue Mar 18 16:01:24 2014 File system creation time >> -u yes Support for large LUNs? >> -z no Is DMAPI enabled? >> -L 4194304 Logfile size >> -E yes Exact mtime mount option >> -S yes Suppress atime mount option >> -K whenpossible Strict replica allocation >> option >> --fastea yes Fast external attributes >> enabled? >> --inode-limit 134217728 Maximum number of inodes >> -P system;data Disk storage pools in file >> system >> -d >> >> gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; >> -d >> >> gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; >> -d >> >> gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; >> -d >> >> gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; >> -d >> >> gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 >> Disks in file system >> --perfileset-quota no Per-fileset quota enforcement >> -A yes Automatic mount option >> -o none Additional mount options >> -T /gpfs1 Default mount point >> --mount-priority 0 Mount priority >> >> >> Regards, >> Salvatore >> > >> On 14/10/14 17:22, Sven Oehme wrote: >> your GSS code version is very backlevel. >> >> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk >> as well as mmlsconfig and mmlsfs all >> >> thx. Sven >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug-discuss at gpfsug.org >> Date: 10/14/2014 08:23 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> >> On 14/10/14 15:51, Sven Oehme wrote: >> it means there is contention on inserting data into the fast write >> log on the GSS Node, which could be config or workload related >> what GSS code version are you running >> [root at ebi5-251 ~]# mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "3.5.0-11 efix1 (888041)". >> Built on Jul 9 2013 at 18:03:32 >> Running 6 days 2 hours 10 minutes 35 secs >> >> >> >> and how are the nodes connected with each other (Ethernet or IB) ? >> ethernet. they use the same bonding (4x10Gb/s) where the data is >> passing. We don't have admin dedicated network >> >> [root at gss03a ~]# mmlscluster >> >> GPFS cluster information >> ======================== >> GPFS cluster name: GSS.ebi.ac.uk >> GPFS cluster id: 17987981184946329605 >> GPFS UID domain: GSS.ebi.ac.uk >> Remote shell command: /usr/bin/ssh >> Remote file copy command: /usr/bin/scp >> >> GPFS cluster configuration servers: >> ----------------------------------- >> Primary server: gss01a.ebi.ac.uk >> Secondary server: gss02b.ebi.ac.uk >> >> Node Daemon node name IP address Admin node name Designation >> ----------------------------------------------------------------------- >> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >> >> >> Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different >> subnet because of datacenter constraints ( They are not physically >> in the same row, and due to network constraints was not possible to >> put them in the same subnet). The packets are routed, but should not >> be a problem as there is 160Gb/s bandwidth between them. >> >> Regards, >> Salvatore >> >> >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 07:40 AM >> Subject: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> hello all, >> could someone explain me the meaning of those waiters? >> >> gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> >> Does it means that the vdisk logs are struggling? >> >> Regards, >> Salvatore >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/ >> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com From oehmes at us.ibm.com Tue Oct 14 18:38:10 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 10:38:10 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> <543D51B6.3070602@ebi.ac.uk> Message-ID: i personally don't know, i am in GPFS Research, not in support :-) but have you tried to contact your sales rep ? if you are not successful with that, shoot me a direct email with details about your company name, country and customer number and i try to get you somebody to help. thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Zachary Giles To: gpfsug main discussion list Date: 10/14/2014 10:33 AM Subject: Re: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org Except that AFAIK no one has published how to update GSS or where the update code is.. All I've heard is "contact your sales rep". Any pointers? On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme wrote: > you basically run GSS 1.0 code , while in the current version is GSS 2.0 > (which replaced Version 1.5 2 month ago) > > GSS 1.5 and 2.0 have several enhancements in this space so i strongly > encourage you to upgrade your systems. > > if you can specify a bit what your workload is there might also be > additional knobs we can turn to change the behavior. > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM: > >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 09:40 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> Thanks in advance for your help. >> >> We have 6 RG: > >> recovery group vdisks vdisks servers >> ------------------ ----------- ------ ------- >> gss01a 4 8 >> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk >> gss01b 4 8 >> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk >> gss02a 4 8 >> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk >> gss02b 4 8 >> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk >> gss03a 4 8 >> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk >> gss03b 4 8 >> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk >> >> Check the attached file for RG details. >> Following mmlsconfig: > >> [root at gss01a ~]# mmlsconfig >> Configuration data for cluster GSS.ebi.ac.uk: >> --------------------------------------------- >> myNodeConfigNumber 1 >> clusterName GSS.ebi.ac.uk >> clusterId 17987981184946329605 >> autoload no >> dmapiFileHandleSize 32 >> minReleaseLevel 3.5.0.11 >> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] >> pagepool 38g >> nsdRAIDBufferPoolSizePct 80 >> maxBufferDescs 2m >> numaMemoryInterleave yes >> prefetchPct 5 >> maxblocksize 16m >> nsdRAIDTracks 128k >> ioHistorySize 64k >> nsdRAIDSmallBufferSize 256k >> nsdMaxWorkerThreads 3k >> nsdMinWorkerThreads 3k >> nsdRAIDSmallThreadRatio 2 >> nsdRAIDThreadsPerQueue 16 >> nsdClientCksumTypeLocal ck64 >> nsdClientCksumTypeRemote ck64 >> nsdRAIDEventLogToConsole all >> nsdRAIDFastWriteFSDataLimit 64k >> nsdRAIDFastWriteFSMetadataLimit 256k >> nsdRAIDReconstructAggressiveness 1 >> nsdRAIDFlusherBuffersLowWatermarkPct 20 >> nsdRAIDFlusherBuffersLimitPct 80 >> nsdRAIDFlusherTracksLowWatermarkPct 20 >> nsdRAIDFlusherTracksLimitPct 80 >> nsdRAIDFlusherFWLogHighWatermarkMB 1000 >> nsdRAIDFlusherFWLogLimitMB 5000 >> nsdRAIDFlusherThreadsLowWatermark 1 >> nsdRAIDFlusherThreadsHighWatermark 512 >> nsdRAIDBlockDeviceMaxSectorsKB 4096 >> nsdRAIDBlockDeviceNrRequests 32 >> nsdRAIDBlockDeviceQueueDepth 16 >> nsdRAIDBlockDeviceScheduler deadline >> nsdRAIDMaxTransientStale2FT 1 >> nsdRAIDMaxTransientStale3FT 1 >> syncWorkerThreads 256 >> tscWorkerPool 64 >> nsdInlineWriteMax 32k >> maxFilesToCache 12k >> maxStatCache 512 >> maxGeneralThreads 1280 >> flushedDataTarget 1024 >> flushedInodeTarget 1024 >> maxFileCleaners 1024 >> maxBufferCleaners 1024 >> logBufferCount 20 >> logWrapAmountPct 2 >> logWrapThreads 128 >> maxAllocRegionsPerNode 32 >> maxBackgroundDeletionThreads 16 >> maxInodeDeallocPrefetch 128 >> maxMBpS 16000 >> maxReceiverThreads 128 >> worker1Threads 1024 >> worker3Threads 32 >> [common] >> cipherList AUTHONLY >> socketMaxListenConnections 1500 >> failureDetectionTime 60 >> [common] >> adminMode central >> >> File systems in cluster GSS.ebi.ac.uk: >> -------------------------------------- >> /dev/gpfs1 > >> For more configuration paramenters i also attached a file with the >> complete output of mmdiag --config. >> >> >> and mmlsfs: >> >> File system attributes for /dev/gpfs1: >> ====================================== >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 32768 Minimum fragment size >> in bytes (system pool) >> 262144 Minimum fragment size >> in bytes (other pools) >> -i 512 Inode size in bytes >> -I 32768 Indirect block size in bytes >> -m 2 Default number of >> metadata replicas >> -M 2 Maximum number of >> metadata replicas >> -r 1 Default number of data >> replicas >> -R 2 Maximum number of data >> replicas >> -j scatter Block allocation type >> -D nfs4 File locking semantics in >> effect >> -k all ACL semantics in effect >> -n 1000 Estimated number of >> nodes that will mount file system >> -B 1048576 Block size (system pool) >> 8388608 Block size (other pools) >> -Q user;group;fileset Quotas enforced >> user;group;fileset Default quotas enabled >> --filesetdf no Fileset df enabled? >> -V 13.23 (3.5.0.7) File system version >> --create-time Tue Mar 18 16:01:24 2014 File system creation time >> -u yes Support for large LUNs? >> -z no Is DMAPI enabled? >> -L 4194304 Logfile size >> -E yes Exact mtime mount option >> -S yes Suppress atime mount option >> -K whenpossible Strict replica allocation >> option >> --fastea yes Fast external attributes >> enabled? >> --inode-limit 134217728 Maximum number of inodes >> -P system;data Disk storage pools in file >> system >> -d >> >> gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; >> -d >> >> gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; >> -d >> >> gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; >> -d >> >> gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; >> -d >> >> gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 >> Disks in file system >> --perfileset-quota no Per-fileset quota enforcement >> -A yes Automatic mount option >> -o none Additional mount options >> -T /gpfs1 Default mount point >> --mount-priority 0 Mount priority >> >> >> Regards, >> Salvatore >> > >> On 14/10/14 17:22, Sven Oehme wrote: >> your GSS code version is very backlevel. >> >> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk >> as well as mmlsconfig and mmlsfs all >> >> thx. Sven >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug-discuss at gpfsug.org >> Date: 10/14/2014 08:23 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> >> On 14/10/14 15:51, Sven Oehme wrote: >> it means there is contention on inserting data into the fast write >> log on the GSS Node, which could be config or workload related >> what GSS code version are you running >> [root at ebi5-251 ~]# mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "3.5.0-11 efix1 (888041)". >> Built on Jul 9 2013 at 18:03:32 >> Running 6 days 2 hours 10 minutes 35 secs >> >> >> >> and how are the nodes connected with each other (Ethernet or IB) ? >> ethernet. they use the same bonding (4x10Gb/s) where the data is >> passing. We don't have admin dedicated network >> >> [root at gss03a ~]# mmlscluster >> >> GPFS cluster information >> ======================== >> GPFS cluster name: GSS.ebi.ac.uk >> GPFS cluster id: 17987981184946329605 >> GPFS UID domain: GSS.ebi.ac.uk >> Remote shell command: /usr/bin/ssh >> Remote file copy command: /usr/bin/scp >> >> GPFS cluster configuration servers: >> ----------------------------------- >> Primary server: gss01a.ebi.ac.uk >> Secondary server: gss02b.ebi.ac.uk >> >> Node Daemon node name IP address Admin node name Designation >> ----------------------------------------------------------------------- >> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >> >> >> Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different >> subnet because of datacenter constraints ( They are not physically >> in the same row, and due to network constraints was not possible to >> put them in the same subnet). The packets are routed, but should not >> be a problem as there is 160Gb/s bandwidth between them. >> >> Regards, >> Salvatore >> >> >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 07:40 AM >> Subject: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> hello all, >> could someone explain me the meaning of those waiters? >> >> gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> >> Does it means that the vdisk logs are struggling? >> >> Regards, >> Salvatore >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/ >> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmcneil at kingston.ac.uk Wed Oct 15 14:01:49 2014 From: tmcneil at kingston.ac.uk (Mcneil, Tony) Date: Wed, 15 Oct 2014 14:01:49 +0100 Subject: [gpfsug-discuss] Hello Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705@KUMBX.kuds.kingston.ac.uk> Hello All, Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. So far we have migrated all our students and approximately 60% of our staff. Looking forward to receiving some interesting posts from the forum. Regards Tony. Tony McNeil Senior Systems Support Analyst, Infrastructure, Information Services ______________________________________________________________________________ T Internal: 62852 T 020 8417 2852 Kingston University London Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. Please consider the environment before printing this email. This email has been scanned for all viruses by the MessageLabs Email Security System. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bill.Pappas at STJUDE.ORG Thu Oct 16 14:49:57 2014 From: Bill.Pappas at STJUDE.ORG (Pappas, Bill) Date: Thu, 16 Oct 2014 08:49:57 -0500 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> Are you using ctdb? Thanks, Bill Pappas - Manager - Enterprise Storage Group Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 bill.pappas at stjude.org (901) 595-4549 office www.stjude.org Email disclaimer: http://www.stjude.org/emaildisclaimer -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org Sent: Thursday, October 16, 2014 6:00 AM To: gpfsug-discuss at gpfsug.org Subject: gpfsug-discuss Digest, Vol 33, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Hello (Mcneil, Tony) ---------------------------------------------------------------------- Message: 1 Date: Wed, 15 Oct 2014 14:01:49 +0100 From: "Mcneil, Tony" To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Hello Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> Content-Type: text/plain; charset="us-ascii" Hello All, Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. So far we have migrated all our students and approximately 60% of our staff. Looking forward to receiving some interesting posts from the forum. Regards Tony. Tony McNeil Senior Systems Support Analyst, Infrastructure, Information Services ______________________________________________________________________________ T Internal: 62852 T 020 8417 2852 Kingston University London Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. Please consider the environment before printing this email. This email has been scanned for all viruses by the MessageLabs Email Security System. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 33, Issue 19 ********************************************** From tmcneil at kingston.ac.uk Fri Oct 17 06:25:00 2014 From: tmcneil at kingston.ac.uk (Mcneil, Tony) Date: Fri, 17 Oct 2014 06:25:00 +0100 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) In-Reply-To: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> Hi Bill, Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel Regards Tony. -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill Sent: 16 October 2014 14:50 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Hello (Mcneil, Tony) Are you using ctdb? Thanks, Bill Pappas - Manager - Enterprise Storage Group Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 bill.pappas at stjude.org (901) 595-4549 office www.stjude.org Email disclaimer: http://www.stjude.org/emaildisclaimer -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org Sent: Thursday, October 16, 2014 6:00 AM To: gpfsug-discuss at gpfsug.org Subject: gpfsug-discuss Digest, Vol 33, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Hello (Mcneil, Tony) ---------------------------------------------------------------------- Message: 1 Date: Wed, 15 Oct 2014 14:01:49 +0100 From: "Mcneil, Tony" To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Hello Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> Content-Type: text/plain; charset="us-ascii" Hello All, Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. So far we have migrated all our students and approximately 60% of our staff. Looking forward to receiving some interesting posts from the forum. Regards Tony. Tony McNeil Senior Systems Support Analyst, Infrastructure, Information Services ______________________________________________________________________________ T Internal: 62852 T 020 8417 2852 Kingston University London Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. Please consider the environment before printing this email. This email has been scanned for all viruses by the MessageLabs Email Security System. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 33, Issue 19 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This email has been scanned for all viruses by the MessageLabs Email Security System. This email has been scanned for all viruses by the MessageLabs Email Security System. From chair at gpfsug.org Tue Oct 21 11:42:10 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Tue, 21 Oct 2014 11:42:10 +0100 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) In-Reply-To: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> Message-ID: <54463882.7070009@gpfsug.org> I noticed that v7000 Unified is using CTDB v3.3. What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. Is that a question for Amitay? On 17/10/14 06:25, Mcneil, Tony wrote: > Hi Bill, > > Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel > > Regards > Tony. > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill > Sent: 16 October 2014 14:50 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] Hello (Mcneil, Tony) > > Are you using ctdb? > > Thanks, > Bill Pappas - > Manager - Enterprise Storage Group > Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital > 262 Danny Thomas Place, Mail Stop 504 > Memphis, TN 38105 > bill.pappas at stjude.org > (901) 595-4549 office > www.stjude.org > Email disclaimer: http://www.stjude.org/emaildisclaimer > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org > Sent: Thursday, October 16, 2014 6:00 AM > To: gpfsug-discuss at gpfsug.org > Subject: gpfsug-discuss Digest, Vol 33, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Hello (Mcneil, Tony) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 15 Oct 2014 14:01:49 +0100 > From: "Mcneil, Tony" > To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Hello > Message-ID: > <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> > > Content-Type: text/plain; charset="us-ascii" > > Hello All, > > Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' > > We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. > > The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. > > So far we have migrated all our students and approximately 60% of our staff. > > Looking forward to receiving some interesting posts from the forum. > > Regards > Tony. > > Tony McNeil > Senior Systems Support Analyst, Infrastructure, Information Services > ______________________________________________________________________________ > > T Internal: 62852 > T 020 8417 2852 > > Kingston University London > Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk > > Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. > Please consider the environment before printing this email. > > > This email has been scanned for all viruses by the MessageLabs Email Security System. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 33, Issue 19 > ********************************************** > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From rtriendl at ddn.com Tue Oct 21 11:53:37 2014 From: rtriendl at ddn.com (Robert Triendl) Date: Tue, 21 Oct 2014 10:53:37 +0000 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) In-Reply-To: <54463882.7070009@gpfsug.org> References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> <54463882.7070009@gpfsug.org> Message-ID: Yes, I think so? I am;-) On 2014/10/21, at 19:42, Jez Tucker (Chair) wrote: > I noticed that v7000 Unified is using CTDB v3.3. > What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. > Is that a question for Amitay? > > On 17/10/14 06:25, Mcneil, Tony wrote: >> Hi Bill, >> >> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel >> >> Regards >> Tony. >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill >> Sent: 16 October 2014 14:50 >> To: gpfsug-discuss at gpfsug.org >> Subject: [gpfsug-discuss] Hello (Mcneil, Tony) >> >> Are you using ctdb? >> >> Thanks, >> Bill Pappas - >> Manager - Enterprise Storage Group >> Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital >> 262 Danny Thomas Place, Mail Stop 504 >> Memphis, TN 38105 >> bill.pappas at stjude.org >> (901) 595-4549 office >> www.stjude.org >> Email disclaimer: http://www.stjude.org/emaildisclaimer >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org >> Sent: Thursday, October 16, 2014 6:00 AM >> To: gpfsug-discuss at gpfsug.org >> Subject: gpfsug-discuss Digest, Vol 33, Issue 19 >> >> Send gpfsug-discuss mailing list submissions to >> gpfsug-discuss at gpfsug.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> or, via email, send a message with subject or body 'help' to >> gpfsug-discuss-request at gpfsug.org >> >> You can reach the person managing the list at >> gpfsug-discuss-owner at gpfsug.org >> >> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." >> >> >> Today's Topics: >> >> 1. Hello (Mcneil, Tony) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Wed, 15 Oct 2014 14:01:49 +0100 >> From: "Mcneil, Tony" >> To: "gpfsug-discuss at gpfsug.org" >> Subject: [gpfsug-discuss] Hello >> Message-ID: >> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> >> >> Content-Type: text/plain; charset="us-ascii" >> >> Hello All, >> >> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' >> >> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. >> >> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. >> >> So far we have migrated all our students and approximately 60% of our staff. >> >> Looking forward to receiving some interesting posts from the forum. >> >> Regards >> Tony. >> >> Tony McNeil >> Senior Systems Support Analyst, Infrastructure, Information Services >> ______________________________________________________________________________ >> >> T Internal: 62852 >> T 020 8417 2852 >> >> Kingston University London >> Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk >> >> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. >> Please consider the environment before printing this email. >> >> >> This email has been scanned for all viruses by the MessageLabs Email Security System. >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> End of gpfsug-discuss Digest, Vol 33, Issue 19 >> ********************************************** >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Bill.Pappas at STJUDE.ORG Tue Oct 21 16:59:08 2014 From: Bill.Pappas at STJUDE.ORG (Pappas, Bill) Date: Tue, 21 Oct 2014 10:59:08 -0500 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) (Jez Tucker (Chair)) Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE73664E3E8D@10.stjude.org> >>Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb. 1. What procedure did you follow to configure ctdb/samba to work? Was it hard? Could you show us, if permitted? 2. Are you also controlling NFS via ctdb? 3. Are you managing multiple IP devices? Eg: ethX0 for VLAN104 and ethX1 for VLAN103 (<- for fast 10GbE users). We use SoNAS and v7000 for most NAS and they use ctdb. Their ctdb results are overall 'ok', with a few bumps here or there. Not too many ctdb PMRs over the 3-4 years on SoNAS. We want to set up ctdb for a GPFS AFM cache that services GPSF data clients. That cache writes to an AFM home (SoNAS). This cache also uses Samba and NFS for lightweight (as in IO, though still important) file access on this cache. It does not use ctdb, but I know it should. I would love to learn how you set your environment up even if it may be a little (or a lot) different. Thanks, Bill Pappas - Manager - Enterprise Storage Group Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 bill.pappas at stjude.org (901) 595-4549 office www.stjude.org Email disclaimer: http://www.stjude.org/emaildisclaimer -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org Sent: Tuesday, October 21, 2014 6:00 AM To: gpfsug-discuss at gpfsug.org Subject: gpfsug-discuss Digest, Vol 33, Issue 21 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Hello (Mcneil, Tony) (Jez Tucker (Chair)) 2. Re: Hello (Mcneil, Tony) (Robert Triendl) ---------------------------------------------------------------------- Message: 1 Date: Tue, 21 Oct 2014 11:42:10 +0100 From: "Jez Tucker (Chair)" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony) Message-ID: <54463882.7070009 at gpfsug.org> Content-Type: text/plain; charset=windows-1252; format=flowed I noticed that v7000 Unified is using CTDB v3.3. What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. Is that a question for Amitay? On 17/10/14 06:25, Mcneil, Tony wrote: > Hi Bill, > > Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel > > Regards > Tony. > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill > Sent: 16 October 2014 14:50 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] Hello (Mcneil, Tony) > > Are you using ctdb? > > Thanks, > Bill Pappas - > Manager - Enterprise Storage Group > Sr. Enterprise Network Storage Architect Information Sciences > Department / Enterprise Informatics Division St. Jude Children's > Research Hospital > 262 Danny Thomas Place, Mail Stop 504 > Memphis, TN 38105 > bill.pappas at stjude.org > (901) 595-4549 office > www.stjude.org > Email disclaimer: http://www.stjude.org/emaildisclaimer > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of > gpfsug-discuss-request at gpfsug.org > Sent: Thursday, October 16, 2014 6:00 AM > To: gpfsug-discuss at gpfsug.org > Subject: gpfsug-discuss Digest, Vol 33, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Hello (Mcneil, Tony) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 15 Oct 2014 14:01:49 +0100 > From: "Mcneil, Tony" > To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Hello > Message-ID: > > <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.u > k> > > Content-Type: text/plain; charset="us-ascii" > > Hello All, > > Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' > > We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. > > The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. > > So far we have migrated all our students and approximately 60% of our staff. > > Looking forward to receiving some interesting posts from the forum. > > Regards > Tony. > > Tony McNeil > Senior Systems Support Analyst, Infrastructure, Information Services > ______________________________________________________________________ > ________ > > T Internal: 62852 > T 020 8417 2852 > > Kingston University London > Penrhyn Road, Kingston upon Thames KT1 2EE > www.kingston.ac.uk > > Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. > Please consider the environment before printing this email. > > > This email has been scanned for all viruses by the MessageLabs Email Security System. > -------------- next part -------------- An HTML attachment was > scrubbed... > URL: > bcf/attachment-0001.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 33, Issue 19 > ********************************************** > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ Message: 2 Date: Tue, 21 Oct 2014 10:53:37 +0000 From: Robert Triendl To: "chair at gpfsug.org" , gpfsug main discussion list Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony) Message-ID: Content-Type: text/plain; charset="Windows-1252" Yes, I think so? I am;-) On 2014/10/21, at 19:42, Jez Tucker (Chair) wrote: > I noticed that v7000 Unified is using CTDB v3.3. > What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. > Is that a question for Amitay? > > On 17/10/14 06:25, Mcneil, Tony wrote: >> Hi Bill, >> >> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel >> >> Regards >> Tony. >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org >> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill >> Sent: 16 October 2014 14:50 >> To: gpfsug-discuss at gpfsug.org >> Subject: [gpfsug-discuss] Hello (Mcneil, Tony) >> >> Are you using ctdb? >> >> Thanks, >> Bill Pappas - >> Manager - Enterprise Storage Group >> Sr. Enterprise Network Storage Architect Information Sciences >> Department / Enterprise Informatics Division St. Jude Children's >> Research Hospital >> 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 >> bill.pappas at stjude.org >> (901) 595-4549 office >> www.stjude.org >> Email disclaimer: http://www.stjude.org/emaildisclaimer >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org >> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of >> gpfsug-discuss-request at gpfsug.org >> Sent: Thursday, October 16, 2014 6:00 AM >> To: gpfsug-discuss at gpfsug.org >> Subject: gpfsug-discuss Digest, Vol 33, Issue 19 >> >> Send gpfsug-discuss mailing list submissions to >> gpfsug-discuss at gpfsug.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> or, via email, send a message with subject or body 'help' to >> gpfsug-discuss-request at gpfsug.org >> >> You can reach the person managing the list at >> gpfsug-discuss-owner at gpfsug.org >> >> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." >> >> >> Today's Topics: >> >> 1. Hello (Mcneil, Tony) >> >> >> --------------------------------------------------------------------- >> - >> >> Message: 1 >> Date: Wed, 15 Oct 2014 14:01:49 +0100 >> From: "Mcneil, Tony" >> To: "gpfsug-discuss at gpfsug.org" >> Subject: [gpfsug-discuss] Hello >> Message-ID: >> >> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac. >> uk> >> >> Content-Type: text/plain; charset="us-ascii" >> >> Hello All, >> >> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' >> >> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. >> >> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. >> >> So far we have migrated all our students and approximately 60% of our staff. >> >> Looking forward to receiving some interesting posts from the forum. >> >> Regards >> Tony. >> >> Tony McNeil >> Senior Systems Support Analyst, Infrastructure, Information Services >> _____________________________________________________________________ >> _________ >> >> T Internal: 62852 >> T 020 8417 2852 >> >> Kingston University London >> Penrhyn Road, Kingston upon Thames KT1 2EE >> www.kingston.ac.uk >> >> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. >> Please consider the environment before printing this email. >> >> >> This email has been scanned for all viruses by the MessageLabs Email Security System. >> -------------- next part -------------- An HTML attachment was >> scrubbed... >> URL: >> > 8bcf/attachment-0001.html> >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> End of gpfsug-discuss Digest, Vol 33, Issue 19 >> ********************************************** >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 33, Issue 21 ********************************************** From bbanister at jumptrading.com Thu Oct 23 19:35:45 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Oct 2014 18:35:45 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> I reviewed my RFE request again and notice that it has been marked as ?Private? and I think this is preventing people from voting on this RFE. I have talked to others that would like to vote for this RFE. How can I set the RFE to public so that others may vote on it? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister Sent: Friday, October 10, 2014 12:13 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted. I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable. I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!) Thanks, -Bryan Ps. Hajo said that he couldn?t access the RFE to vote on it: I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo Here is what the RFE website states: Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 A unique URL that you can bookmark and share with others. From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Friday, October 10, 2014 11:52 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > wrote: I agree with Ben, I think. I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Oct 23 19:50:21 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Oct 2014 18:50:21 +0000 Subject: [gpfsug-discuss] GPFS User Group at SC14 Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C68@CHI-EXCHANGEW2.w2k.jumptrading.com> I'm going to be attending the GPFS User Group at SC14 this year. Here is basic agenda that was provided: GPFS/Elastic Storage User Group Monday, November 17, 2014 3:00 PM-5:00 PM: GPFS/Elastic Storage User Group [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] IBM Software Defined Storage strategy update [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Customer presentations [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Future directions such as object storage and OpenStack integration [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Elastic Storage server update [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Elastic Storage roadmap (*NDA required) 5:00 PM: Reception Conference room location provided upon registration. *Attendees must sign a non-disclosure agreement upon arrival or as provided in advance. I think it would be great to review the submitted RFEs and give the user group the chance to vote on them to help promote the RFEs that we care about most. I would also really appreciate any additional details regarding the new GPFS 4.1 deadlock detection facility and any recommended best practices around this new feature. Thanks! -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 76 bytes Desc: image001.gif URL: From chair at gpfsug.org Thu Oct 23 19:52:07 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Thu, 23 Oct 2014 19:52:07 +0100 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <54494E57.90304@gpfsug.org> Hi Bryan Unsure, to be honest. When I added all the GPFS UG RFEs in, I didn't see an option to make the RFE private. There's private fields, but not a 'make this RFE private' checkbox or such. This one may be better directed to the GPFS developer forum / redo the RFE. RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll be feeding info back to the group. Jez On 23/10/14 19:35, Bryan Banister wrote: > > I reviewed my RFE request again and notice that it has been marked as > ?Private? and I think this is preventing people from voting on this > RFE. I have talked to others that would like to vote for this RFE. > > How can I set the RFE to public so that others may vote on it? > > Thanks! > > -Bryan > > *From:*gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Bryan Banister > *Sent:* Friday, October 10, 2014 12:13 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > A DMAPI daemon solution puts a dependency on the DMAPI daemon for the > file system to be mounted. I think it would be better to have > something like what I requested in the RFE that would hopefully not > have this dependency, and would be optional/configurable. I?m sure we > would all prefer something that is supported directly by IBM (hence > the RFE!) > > Thanks, > > -Bryan > > Ps. Hajo said that he couldn?t access the RFE to vote on it: > > I would like to support the RFE but i get: > > "You cannot access this page because you do not have the proper > authority." > > Cheers > > Hajo > > Here is what the RFE website states: > > Bookmarkable > URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 > A unique URL that you can bookmark and share with others. > > *From:*gpfsug-discuss-bounces at gpfsug.org > > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Sven Oehme > *Sent:* Friday, October 10, 2014 11:52 AM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > The only DMAPI agent i am aware of is a prototype that was written by > tridge in 2008 to demonstrate a file based HSM system for GPFS. > > its a working prototype, at least it worked in 2008 :-) > > you can get the source code from git : > > http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary > > just to be clear, there is no Support for this code. we obviously > Support the DMAPI interface , but the code that exposes the API is > nothing we provide Support for. > > thx. Sven > > On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > > wrote: > > I agree with Ben, I think. > > I don?t want to use the ILM policy engine as that puts a direct > workload against the metadata storage and server resources. We need > something out-of-band, out of the file system operational path. > > Is there a simple DMAPI daemon that would log the file system > namespace changes that we could use? > > If so are there any limitations? > > And is it possible to set this up in an HA environment? > > Thanks! > > -Bryan > > *From:*gpfsug-discuss-bounces at gpfsug.org > > [mailto:gpfsug-discuss-bounces at gpfsug.org > ] *On Behalf Of *Ben De Luca > *Sent:* Friday, October 10, 2014 11:10 AM > > > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > querying this through the policy engine is far to late to do any thing > useful with it > > On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: > > Ben, > > to get lists of 'Hot Files' turn File Heat on , some discussion about > it is here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 > > thx. Sven > > On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: > > Id like this to see hot files > > On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > > wrote: > > Hmm... I didn't think to use the DMAPI interface. That could be a > nice option. Has anybody done this already and are there any examples > we could look at? > > Thanks! > -Bryan > > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org > > [mailto:gpfsug-discuss-bounces at gpfsug.org > ] On Behalf Of Phil Pishioneri > Sent: Friday, October 10, 2014 10:04 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS RFE promotion > > On 10/9/14 3:31 PM, Bryan Banister wrote: > > > > Just wanted to pass my GPFS RFE along: > > > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > > 0458 > > > > > > *Description*: > > > > GPFS File System Manager should provide the option to log all file and > > directory operations that occur in a file system, preferably stored in > > a TSD (Time Series Database) that could be quickly queried through an > > API interface and command line tools. ... > > > > The rudimentaries for this already exist via the DMAPI interface in > GPFS (used by the TSM HSM product). A while ago this was posted to the > IBM GPFS DeveloperWorks forum: > > On 1/3/11 10:27 AM, dWForums wrote: > > Author: > > AlokK.Dhir > > > > Message: > > We have a proof of concept which uses DMAPI to listens to and > passively logs filesystem changes with a non blocking listener. This > log can be used to generate backup sets etc. Unfortunately, a bug in > the current DMAPI keeps this approach from working in the case of > certain events. I am told 3.4.0.3 may contain a fix. We will gladly > share the code once it is working. > > -Phil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ------------------------------------------------------------------------ > > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ------------------------------------------------------------------------ > > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > > ------------------------------------------------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Oct 23 19:59:52 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Oct 2014 18:59:52 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <54494E57.90304@gpfsug.org> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> <54494E57.90304@gpfsug.org> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C98@CHI-EXCHANGEW2.w2k.jumptrading.com> Looks like IBM decides if the RFE is public or private: Q: What are private requests? A: Private requests are requests that can be viewed only by IBM, the request author, members of a group with the request in its watchlist, and users with the request in their watchlist. Only the author of the request can add a private request to their watchlist or a group watchlist. Private requests appear in various public views, such as Top 20 watched or Planned requests; however, only limited information about the request will be displayed. IBM determines the default request visibility of a request, either public or private, and IBM may change the request visibility at any time. If you are watching a request and have subscribed to email notifications, you will be notified if the visibility of the request changes. I'm submitting a request to make the RFE public so that others may vote on it now, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jez Tucker (Chair) Sent: Thursday, October 23, 2014 1:52 PM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] GPFS RFE promotion Hi Bryan Unsure, to be honest. When I added all the GPFS UG RFEs in, I didn't see an option to make the RFE private. There's private fields, but not a 'make this RFE private' checkbox or such. This one may be better directed to the GPFS developer forum / redo the RFE. RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll be feeding info back to the group. Jez On 23/10/14 19:35, Bryan Banister wrote: I reviewed my RFE request again and notice that it has been marked as "Private" and I think this is preventing people from voting on this RFE. I have talked to others that would like to vote for this RFE. How can I set the RFE to public so that others may vote on it? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister Sent: Friday, October 10, 2014 12:13 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted. I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable. I'm sure we would all prefer something that is supported directly by IBM (hence the RFE!) Thanks, -Bryan Ps. Hajo said that he couldn't access the RFE to vote on it: I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo Here is what the RFE website states: Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 A unique URL that you can bookmark and share with others. From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Friday, October 10, 2014 11:52 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > wrote: I agree with Ben, I think. I don't want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 24 19:58:07 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 24 Oct 2014 18:58:07 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB94C513@CHI-EXCHANGEW2.w2k.jumptrading.com> It is with humble apology and great relief that I was wrong about the AFM limitation that I believed existed in the configuration I explained below. The problem that I had with my configuration is that the NSD client cluster was not completely updated to GPFS 4.1.0-3, as there are a few nodes still running 3.5.0-20 in the cluster which currently prevents upgrading the GPFS file system release version (e.g. mmchconfig release=LATEST) to 4.1.0-3. This GPFS configuration ?requirement? isn?t documented in the Advanced Admin Guide, but it makes sense that this is required since only the GPFS 4.1 release supports the GPFS protocol for AFM fileset targets. I have tested the configuration with a new NSD Client cluster and the configuration works as desired. Thanks Kalyan and others for their feedback. Our file system namespace is unfortunately filled with small files that do not allow AFM to parallelize the data transfers across multiple nodes. And unfortunately AFM will only allow one Gateway node per fileset to perform the prefetch namespace scan operation, which is incredibly slow as I stated before. We were only seeing roughly 100 x " Queue numExec" operations per second. I think this performance is gated by the directory namespace scan of the single gateway node. Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 10:21 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations some clarifications inline: Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister > To: gpfsug main discussion list > Date: 10/07/2014 08:12 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org Interesting that AFM is supposed to work in a multi-cluster environment. We were using GPFS on the backend. The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount. The "gateway" nodes in the new cluster mounted the old file system. All systems were connected over the same QDR IB fabric. The client compute nodes in the third cluster mounted both the old and new file systems. I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed. I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable. There were no error messages in any GPFS or system logs related to the `ls` fail. NFS clients accessing cNFS servers in the new cluster also worked as expected. The `ls` from the NFS client in an AFM fileset returned the expected directory listing. Thus all symptoms indicated the configuration wasn't supported. I may try to replicate the problem in a test environment at some point. However AFM isn't really a great solution for file data migration between file systems for these reasons: 1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data). No way to have it simply keep the two namespaces in sync. And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system. This is concerning and it is not easy to detect that this event has occurred. --> The LU mode is meant for scenarios where changes in cache are not --> meant to be pushed back to old filesystem. If thats not whats desired then other AFM modes like IW can be used to keep namespace in sync and data can flow from both sides. Typically, for data migration --metadata-only to pull in the full namespace first and data can be migrated on demand or via policy as outlined above using prefetch cmd. AFM setup should be extension to GPFS multi-cluster setup when using GPFS backend. 2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME! And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES! --> I am not sure I follow the first downtime. If applications have to start using the new filesystem, then they have to be informed accordingly. If this can be done without bringing down applications, then there is no DOWNTIME. Regarding, second downtime, you are right, disabling AFM after data migration requires unlink and hence downtime. But there is a easy workaround, where revalidation intervals can be increased to max or GW nodes can be unconfigured without downtime with same effect. And disabling AFM can be done at a later point during maintenance window. We plan to modify this to have this done online aka without requiring unlink of the fileset. This will get prioritized if there is enough interest in AFM being used in this direction. 3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration. Multiple threads from a single node just doesn't cut it due to single node bandwidth limits. When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second. The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list! --> Prefetch can run on multiple nodes by configuring multiple GW nodes --> and enabling parallel i/o as specified in the docs..link provided below. Infact it can parallelize data xfer to a single file and also do multiple files in parallel depending on filesizes and various tuning params. 4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration. Yes, there is of course efforts to breakup the namespace for each rsync operations. This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired. --> AFM can be used for data migration without any downtime dictated by --> AFM (see above) and it can infact use multiple threads on multiple nodes to do parallel i/o. AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN. Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 12:03 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister > To: gpfsug main discussion list > Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister > wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Wed Oct 29 13:59:40 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Wed, 29 Oct 2014 13:59:40 +0000 Subject: [gpfsug-discuss] Storagebeers, Nov 13th Message-ID: <5450F2CC.3070302@gpfsug.org> Hello all, I just thought I'd make you all aware of a social, #storagebeers on Nov 13th organised by Martin Glassborow, one of our UG members. http://www.gpfsug.org/2014/10/29/storagebeers-13th-nov/ I'll be popping along. Hopefully see you there. Jez From Jared.Baker at uwyo.edu Wed Oct 29 15:31:31 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 15:31:31 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings Message-ID: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 16:33:22 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 16:33:22 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <1414600402.24518.216.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-10-29 at 15:31 +0000, Jared David Baker wrote: [SNIP] > I?m wondering if somebody has seen this type of issue before? Will > recreating my NSDs destroy the filesystem? I?m thinking that all the > data is intact, but there is no crucial data on this file system yet, > so I could recreate the file system, but I would like to learn how to > solve a problem like this. Thanks for all help and information. > At an educated guess and assuming the disks are visible to the OS (try dd'ing the first few GB to /dev/null) it looks like you have managed at some point to wipe the NSD descriptors from the disks - ouch. The file system will continue to work after this has been done, but if you start rebooting the NSD servers you will find after the last one has been restarted the file system is unmountable. Simply unmounting the file systems from each NDS server is also probably enough. For good measure unless you have a backup of the NSD descriptors somewhere it is also an unrecoverable condition. Lucky for you if there is nothing on it that matters. My suggestion is re-examine what you did during the firmware upgrade, as that is the most likely culprit. However bear in mind that it could have been days or even weeks ago that it occurred. I would raise a PMR to be sure, but it looks to me like you will be recreating the file system from scratch. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From oehmes at gmail.com Wed Oct 29 16:42:26 2014 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 29 Oct 2014 09:42:26 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Hello, there are multiple reasons why the descriptors can not be found . there was a recent change in firmware behaviors on multiple servers that restore the GPT table from a disk if the disk was used as a OS disk before used as GPFS disks. some infos here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e if thats the case there is a procedure to restore them. it could also be something very trivial , e.g. that your multipath mapping changed and your nsddevice file actually just prints out devices instead of scanning them and create a list on the fly , so GPFS ignores the new path to the disks. in any case , opening a PMR and work with Support is the best thing to do before causing any more damage. if the file-system is still mounted don't unmount it under any circumstances as Support needs to extract NSD descriptor information from it to restore them easily. Sven On Wed, Oct 29, 2014 at 8:31 AM, Jared David Baker wrote: > Hello all, > > > > I?m hoping that somebody can shed some light on a problem that I > experienced yesterday. I?ve been working with GPFS for a couple months as > an admin now, but I?ve come across a problem that I?m unable to see the > answer to. Hopefully the solution is not listed somewhere blatantly on the > web, but I spent a fair amount of time looking last night. Here is the > situation: yesterday, I needed to update some firmware on a Mellanox HCA > FDR14 card and reboot one of our GPFS servers and repeat for the sister > node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, > upon reboot, the server seemed to lose the path mappings to the multipath > devices for the NSDs. Output below: > > > > -- > > [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch > > > > Disk name NSD volume ID Device Node name > Remarks > > > --------------------------------------------------------------------------------------- > > dcs3800u31a_lun0 0A62001B54235577 - > mminsd5.infini (not found) server node > > dcs3800u31a_lun0 0A62001B54235577 - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - > mminsd5.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - > mminsd6.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - > mminsd6.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - > mminsd5.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini > (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - > mminsd6.infini (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - > mminsd5.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - > mminsd6.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - > mminsd6.infini (not found) server node > > > > [root at mmmnsd5 ~]# > > -- > > > > Also, the system was working fantastically before the reboot, but now I?m > unable to mount the GPFS filesystem. The disk names look like they are > there and mapped to the NSD volume ID, but there is no Device. I?ve created > the /var/mmfs/etc/nsddevices script and it has the following output with > user return 0: > > > > -- > > [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices > > mapper/dcs3800u31a_lun0 dmm > > mapper/dcs3800u31a_lun10 dmm > > mapper/dcs3800u31a_lun2 dmm > > mapper/dcs3800u31a_lun4 dmm > > mapper/dcs3800u31a_lun6 dmm > > mapper/dcs3800u31a_lun8 dmm > > mapper/dcs3800u31b_lun1 dmm > > mapper/dcs3800u31b_lun11 dmm > > mapper/dcs3800u31b_lun3 dmm > > mapper/dcs3800u31b_lun5 dmm > > mapper/dcs3800u31b_lun7 dmm > > mapper/dcs3800u31b_lun9 dmm > > [root at mmmnsd5 ~]# > > -- > > > > That output looks correct to me based on the documentation. So I went > digging in the GPFS log file and found this relevant information: > > > > -- > > Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No > such NSD locally found. > > Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No > such NSD locally found. > > Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. > No such NSD locally found. > > Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. > No such NSD locally found. > > -- > > > > Okay, so the NSDs don?t seem to be able to be found, so I attempt to > rediscover the NSD by executing the command mmnsddiscover: > > > > -- > > [root at mmmnsd5 ~]# mmnsddiscover > > mmnsddiscover: Attempting to rediscover the disks. This may take a while > ... > > mmnsddiscover: Finished. > > [root at mmmnsd5 ~]# > > -- > > > > I was hoping that finished, but then upon restarting GPFS, there was no > success. Verifying with mmlsnsd -X -f gscratch > > > > -- > > [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch > > > > Disk name NSD volume ID Device Devtype Node > name Remarks > > > --------------------------------------------------------------------------------------------------- > > dcs3800u31a_lun0 0A62001B54235577 - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun0 0A62001B54235577 - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - - > mminsd6.infini (not found) server node > > > > [root at mmmnsd5 ~]# > > -- > > > > I?m wondering if somebody has seen this type of issue before? Will > recreating my NSDs destroy the filesystem? I?m thinking that all the data > is intact, but there is no crucial data on this file system yet, so I could > recreate the file system, but I would like to learn how to solve a problem > like this. Thanks for all help and information. > > > > Regards, > > > > Jared > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Wed Oct 29 16:46:35 2014 From: oester at gmail.com (Bob Oesterlin) Date: Wed, 29 Oct 2014 11:46:35 -0500 Subject: [gpfsug-discuss] GPFS 4.1 event "deadlockOverload" Message-ID: I posted this to developerworks, but haven't seen a response. This is NOT the same event "deadlockDetected" that is documented in the 4.1 Probelm Determination Guide. I see these errors -in my mmfslog on the cluster master. I just upgraded to 4.1, and I can't find this documented anywhere. What is "event deadlockOverload" ? And what script would it call? The nodes in question are part of a CNFS group. Mon Oct 27 10:11:08.848 2014: [I] Received overload notification request from 10.30.42.30 to forward to all nodes in cluster XXX Mon Oct 27 10:11:08.849 2014: [I] Calling User Exit Script gpfsNotifyOverload: event deadlockOverload, Async command /usr/lpp/mmfs/bin/mmcommon. Mon Oct 27 10:11:14.478 2014: [I] Received overload notification request from 10.30.42.26 to forward to all nodes in cluster XXX Mon Oct 27 10:11:58.869 2014: [I] Received overload notification request from 10.30.42.30 to forward to all nodes in cluster XXX Mon Oct 27 10:11:58.870 2014: [I] Calling User Exit Script gpfsNotifyOverload: event deadlockOverload, Async command /usr/lpp/mmfs/bin/mmcommon. Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 17:19:14 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 17:19:14 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > Hello, > > > there are multiple reasons why the descriptors can not be found . > > > there was a recent change in firmware behaviors on multiple servers > that restore the GPT table from a disk if the disk was used as a OS > disk before used as GPFS disks. some infos > here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > if thats the case there is a procedure to restore them. I have been categorically told by IBM in no uncertain terms if the NSD descriptors have *ALL* been wiped then it is game over for that file system; restore from backup is your only option. If the GPT table has been "restored" and overwritten the NSD descriptors then you are hosed. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From oehmes at gmail.com Wed Oct 29 17:22:30 2014 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 29 Oct 2014 10:22:30 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> Message-ID: if you still have a running system you can extract the information and recreate the descriptors. if your sytem is already down, this is not possible any more. which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help . Sven On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard wrote: > On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > > Hello, > > > > > > there are multiple reasons why the descriptors can not be found . > > > > > > there was a recent change in firmware behaviors on multiple servers > > that restore the GPT table from a disk if the disk was used as a OS > > disk before used as GPFS disks. some infos > > here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > > > > if thats the case there is a procedure to restore them. > > I have been categorically told by IBM in no uncertain terms if the NSD > descriptors have *ALL* been wiped then it is game over for that file > system; restore from backup is your only option. > > If the GPT table has been "restored" and overwritten the NSD descriptors > then you are hosed. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 17:29:09 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 17:29:09 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> Message-ID: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote: > if you still have a running system you can extract the information and > recreate the descriptors. We had a running system with the file system still mounted on some nodes but all the NSD descriptors wiped, and I repeat where categorically told by IBM that nothing could be done and to restore the file system from backup. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Jared.Baker at uwyo.edu Wed Oct 29 17:30:00 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 17:30:00 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> Message-ID: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com> Thanks for all the information. I?m not exactly sure what happened during the firmware update of the HCAs (another admin). But I do have all the stanza files that I used to create the NSDs. Possible to utilize them to just regenerate the NSDs or is it consensus that the FS is gone? As the system was not in production (yet) I?ve got no problem delaying the release and running some tests to verify possible fixes. The system was already unmounted, so it is a completely inactive FS across the cluster. Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 11:23 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings if you still have a running system you can extract the information and recreate the descriptors. if your sytem is already down, this is not possible any more. which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help . Sven On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard > wrote: On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > Hello, > > > there are multiple reasons why the descriptors can not be found . > > > there was a recent change in firmware behaviors on multiple servers > that restore the GPT table from a disk if the disk was used as a OS > disk before used as GPFS disks. some infos > here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > if thats the case there is a procedure to restore them. I have been categorically told by IBM in no uncertain terms if the NSD descriptors have *ALL* been wiped then it is game over for that file system; restore from backup is your only option. If the GPT table has been "restored" and overwritten the NSD descriptors then you are hosed. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Oct 29 17:45:38 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 10:45:38 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Jared, if time permits i would open a PMR to check what happened. as i stated in my first email it could be multiple things, the GPT restore is only one possible of many explanations and some more simple reasons could explain what you see as well. get somebody from support check the state and then we know for sure. it would give you also peace of mind that it doesn't happen again when you are in production. if you feel its not worth and you don't wipe any important information start over again. btw. the newer BIOS versions of IBM servers have a option from preventing the GPT issue from happening : [root at gss02n1 ~]# asu64 showvalues DiskGPTRecovery.DiskGPTRecovery IBM Advanced Settings Utility version 9.61.85B Licensed Materials - Property of IBM (C) Copyright IBM Corp. 2007-2014 All Rights Reserved IMM LAN-over-USB device 0 enabled successfully. Successfully discovered the IMM via SLP. Discovered IMM at IP address 169.254.95.118 Connected to IMM at IP address 169.254.95.118 DiskGPTRecovery.DiskGPTRecovery=None= if you set it the GPT will never get restored. you would have to set this on all the nodes that have access to the disks. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 10:30 AM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks for all the information. I?m not exactly sure what happened during the firmware update of the HCAs (another admin). But I do have all the stanza files that I used to create the NSDs. Possible to utilize them to just regenerate the NSDs or is it consensus that the FS is gone? As the system was not in production (yet) I?ve got no problem delaying the release and running some tests to verify possible fixes. The system was already unmounted, so it is a completely inactive FS across the cluster. Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 11:23 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings if you still have a running system you can extract the information and recreate the descriptors. if your sytem is already down, this is not possible any more. which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help . Sven On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard wrote: On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > Hello, > > > there are multiple reasons why the descriptors can not be found . > > > there was a recent change in firmware behaviors on multiple servers > that restore the GPT table from a disk if the disk was used as a OS > disk before used as GPFS disks. some infos > here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > if thats the case there is a procedure to restore them. I have been categorically told by IBM in no uncertain terms if the NSD descriptors have *ALL* been wiped then it is game over for that file system; restore from backup is your only option. If the GPT table has been "restored" and overwritten the NSD descriptors then you are hosed. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Wed Oct 29 18:57:28 2014 From: ewahl at osc.edu (Ed Wahl) Date: Wed, 29 Oct 2014 18:57:28 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> , <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk> Message-ID: SOBAR is your friend at that point? Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jonathan Buzzard [jonathan at buzzard.me.uk] Sent: Wednesday, October 29, 2014 1:29 PM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Server lost NSD mappings On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote: > if you still have a running system you can extract the information and > recreate the descriptors. We had a running system with the file system still mounted on some nodes but all the NSD descriptors wiped, and I repeat where categorically told by IBM that nothing could be done and to restore the file system from backup. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ewahl at osc.edu Wed Oct 29 19:07:34 2014 From: ewahl at osc.edu (Ed Wahl) Date: Wed, 29 Oct 2014 19:07:34 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I?m hoping that somebody can shed some light on a problem that I experienced yesterday. I?ve been working with GPFS for a couple months as an admin now, but I?ve come across a problem that I?m unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I?m unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I?ve created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don?t seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I?m wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I?m thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared From Jared.Baker at uwyo.edu Wed Oct 29 19:27:26 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 19:27:26 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at us.ibm.com Wed Oct 29 19:41:22 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 12:41:22 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jared.Baker at uwyo.edu Wed Oct 29 19:46:23 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 19:46:23 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Oct 29 20:02:53 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 13:02:53 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jared.Baker at uwyo.edu Wed Oct 29 20:13:06 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 20:13:06 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Apologies Sven, w/o comments below: -- #!/bin/ksh CONTROLLER_REGEX='[ab]_lun[0-9]+' for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX ) do echo mapper/$dev dmm #echo mapper/$dev generic done # Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover), return 0 -- Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Oct 29 20:25:10 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 13:25:10 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Hi, based on what i see is your BIOS or FW update wiped the NSD descriptor by restoring a GPT table on the start of a disk that shouldn't have a GPT table to begin with as its under control of GPFS. future releases of GPFS prevent this by writing our own GPT label to the disks so other tools don't touch them, but that doesn't help in your case any more. if you want this officially confirmed i would still open a PMR, but at that point given that you don't seem to have any production data on it from what i see in your response you should recreate the filesystem. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 01:13 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Apologies Sven, w/o comments below: -- #!/bin/ksh CONTROLLER_REGEX='[ab]_lun[0-9]+' for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX ) do echo mapper/$dev dmm #echo mapper/$dev generic done # Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover), return 0 -- Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jared.Baker at uwyo.edu Wed Oct 29 20:30:29 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 20:30:29 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Thanks Sven, I appreciate the feedback. I'll be opening the PMR soon. Again, thanks for the information. Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, based on what i see is your BIOS or FW update wiped the NSD descriptor by restoring a GPT table on the start of a disk that shouldn't have a GPT table to begin with as its under control of GPFS. future releases of GPFS prevent this by writing our own GPT label to the disks so other tools don't touch them, but that doesn't help in your case any more. if you want this officially confirmed i would still open a PMR, but at that point given that you don't seem to have any production data on it from what i see in your response you should recreate the filesystem. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 01:13 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Apologies Sven, w/o comments below: -- #!/bin/ksh CONTROLLER_REGEX='[ab]_lun[0-9]+' for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX ) do echo mapper/$dev dmm #echo mapper/$dev generic done # Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover), return 0 -- Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 20:32:25 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 20:32:25 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <54514ED9.9030604@buzzard.me.uk> On 29/10/14 20:25, Sven Oehme wrote: > Hi, > > based on what i see is your BIOS or FW update wiped the NSD descriptor > by restoring a GPT table on the start of a disk that shouldn't have a > GPT table to begin with as its under control of GPFS. > future releases of GPFS prevent this by writing our own GPT label to the > disks so other tools don't touch them, but that doesn't help in your > case any more. if you want this officially confirmed i would still open > a PMR, but at that point given that you don't seem to have any > production data on it from what i see in your response you should > recreate the filesystem. > However before recreating the file system I would run the script to see if your disks have the secondary copy of the GPT partition table and if they do make sure it is wiped/removed *BEFORE* you go any further. Otherwise it could happen again... JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Jared.Baker at uwyo.edu Wed Oct 29 20:47:51 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 20:47:51 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <54514ED9.9030604@buzzard.me.uk> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> <54514ED9.9030604@buzzard.me.uk> Message-ID: Jonathan, which script are you talking about? Thanks, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jonathan Buzzard Sent: Wednesday, October 29, 2014 2:32 PM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Server lost NSD mappings On 29/10/14 20:25, Sven Oehme wrote: > Hi, > > based on what i see is your BIOS or FW update wiped the NSD descriptor > by restoring a GPT table on the start of a disk that shouldn't have a > GPT table to begin with as its under control of GPFS. > future releases of GPFS prevent this by writing our own GPT label to the > disks so other tools don't touch them, but that doesn't help in your > case any more. if you want this officially confirmed i would still open > a PMR, but at that point given that you don't seem to have any > production data on it from what i see in your response you should > recreate the filesystem. > However before recreating the file system I would run the script to see if your disks have the secondary copy of the GPT partition table and if they do make sure it is wiped/removed *BEFORE* you go any further. Otherwise it could happen again... JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Wed Oct 29 21:01:06 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 21:01:06 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> <54514ED9.9030604@buzzard.me.uk> Message-ID: <54515592.4050606@buzzard.me.uk> On 29/10/14 20:47, Jared David Baker wrote: > Jonathan, which script are you talking about? > The one here https://www.ibm.com/developerworks/community/forums/html/topic?id=32296bac-bfa1-45ff-9a43-08b0a36b17ef&ps=25 Use for detecting and clearing that secondary GPT table. Never used it of course, my disaster was caused by an idiot admin installing a new OS not mapping the disks out and then hit yes yes yes when asked if he wanted to blank the disks, the RHEL installer duly obliged. Then five days later I rebooted the last NSD server for an upgrade and BOOM 50TB and 80 million files down the swanny. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From mark.bergman at uphs.upenn.edu Fri Oct 31 17:10:55 2014 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Fri, 31 Oct 2014 13:10:55 -0400 Subject: [gpfsug-discuss] mapping to hostname? Message-ID: <25152-1414775455.156309@Pc2q.WYui.XCNm> Many GPFS logs & utilities refer to nodes via their name. I haven't found an "mm*" executable that shows the mapping between that name an the hostname. Is there a simple method to map the designation to the node's hostname? Thanks, Mark From bevans at pixitmedia.com Fri Oct 31 17:32:45 2014 From: bevans at pixitmedia.com (Barry Evans) Date: Fri, 31 Oct 2014 17:32:45 +0000 Subject: [gpfsug-discuss] mapping to hostname? In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm> References: <25152-1414775455.156309@Pc2q.WYui.XCNm> Message-ID: <5453C7BD.8030608@pixitmedia.com> I'm sure there is a better way to do this, but old habits die hard. I tend to use 'mmfsadm saferdump tscomm' - connection details should be littered throughout. Cheers, Barry ArcaStream/Pixit Media mark.bergman at uphs.upenn.edu wrote: > Many GPFS logs& utilities refer to nodes via their name. > > I haven't found an "mm*" executable that shows the mapping between that > name an the hostname. > > Is there a simple method to map the designation to the node's > hostname? > > Thanks, > > Mark > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. From oehmes at us.ibm.com Fri Oct 31 18:20:40 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Fri, 31 Oct 2014 11:20:40 -0700 Subject: [gpfsug-discuss] mapping to hostname? In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm> References: <25152-1414775455.156309@Pc2q.WYui.XCNm> Message-ID: Hi, the official way to do this is mmdiag --network thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: mark.bergman at uphs.upenn.edu To: gpfsug main discussion list Date: 10/31/2014 10:11 AM Subject: [gpfsug-discuss] mapping to hostname? Sent by: gpfsug-discuss-bounces at gpfsug.org Many GPFS logs & utilities refer to nodes via their name. I haven't found an "mm*" executable that shows the mapping between that name an the hostname. Is there a simple method to map the designation to the node's hostname? Thanks, Mark _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.bergman at uphs.upenn.edu Fri Oct 31 18:57:44 2014 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Fri, 31 Oct 2014 14:57:44 -0400 Subject: [gpfsug-discuss] mapping to hostname? In-Reply-To: Your message of "Fri, 31 Oct 2014 11:20:40 -0700." References: <25152-1414775455.156309@Pc2q.WYui.XCNm> Message-ID: <9586-1414781864.388104@tEdB.dMla.tGDi> In the message dated: Fri, 31 Oct 2014 11:20:40 -0700, The pithy ruminations from Sven Oehme on to hostname?> were: => Hi, => => the official way to do this is mmdiag --network OK. I'm now using: mmdiag --network | awk '{if ( $1 ~ / => thx. Sven => => => ------------------------------------------ => Sven Oehme => Scalable Storage Research => email: oehmes at us.ibm.com => Phone: +1 (408) 824-8904 => IBM Almaden Research Lab => ------------------------------------------ => => => => From: mark.bergman at uphs.upenn.edu => To: gpfsug main discussion list => Date: 10/31/2014 10:11 AM => Subject: [gpfsug-discuss] mapping to hostname? => Sent by: gpfsug-discuss-bounces at gpfsug.org => => => => Many GPFS logs & utilities refer to nodes via their name. => => I haven't found an "mm*" executable that shows the mapping between that => name an the hostname. => => Is there a simple method to map the designation to the node's => hostname? => => Thanks, => => Mark => From stuartb at 4gh.net Fri Oct 3 18:19:08 2014 From: stuartb at 4gh.net (Stuart Barkley) Date: Fri, 3 Oct 2014 13:19:08 -0400 (EDT) Subject: [gpfsug-discuss] filesets and mountpoint naming Message-ID: Resent: First copy sent Sept 23. Maybe stuck in a moderation queue? When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From bbanister at jumptrading.com Mon Oct 6 16:17:44 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Mon, 6 Oct 2014 15:17:44 +0000 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com> There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang. Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment. Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley Sent: Friday, October 03, 2014 12:19 PM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] filesets and mountpoint naming Resent: First copy sent Sept 23. Maybe stuck in a moderation queue? When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From bbanister at jumptrading.com Mon Oct 6 16:36:17 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Mon, 6 Oct 2014 15:36:17 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch -j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the "--home-inode-file" from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sandra.McLaughlin at astrazeneca.com Mon Oct 6 16:40:45 2014 From: Sandra.McLaughlin at astrazeneca.com (McLaughlin, Sandra M) Date: Mon, 6 Oct 2014 15:40:45 +0000 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <5ed81d7bfbc94873aa804cfc807d5858@DBXPR04MB031.eurprd04.prod.outlook.com> Hi Stuart, We have a very similar setup. I use /gpfs01, /gpfs02 etc. and then use filesets within those, and symbolic links on the gpfs cluster members to give the same user experience combined with automounter maps (we have a large number of NFS clients as well as cluster members). This all works quite well. Regards, Sandra -------------------------------------------------------------------------- AstraZeneca UK Limited is a company incorporated in England and Wales with registered number: 03674842 and a registered office at 2 Kingdom Street, London, W2 6BD. Confidentiality Notice: This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful. Disclaimer: Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by AstraZeneca UK Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email. Monitoring: AstraZeneca UK Limited may monitor email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking Compliance with our Code of Conduct and Policies. -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley Sent: 23 September 2014 16:47 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] filesets and mountpoint naming When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From zgiles at gmail.com Mon Oct 6 16:42:56 2014 From: zgiles at gmail.com (Zachary Giles) Date: Mon, 6 Oct 2014 11:42:56 -0400 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Here we have just one large GPFS file system with many file sets inside. We mount it under /sc/something (sc for scientific computing). We user the /sc/ as we previously had another GPFS file system while migrating from one to the other. It's pretty easy and straight forward to have just one file system.. eases administration and mounting. You can make symlinks.. like /scratch -> /sc/something/scratch/ if you want. We did that, and it's how most of our users got to the system for a long time. We even remounted the GPFS file system from where DDN left it at install time ( /gs01 ) to /sc/gs01, updated the symlink, and the users never knew. Multicluster for compute nodes separate from the FS cluster. YMMV depending on if you want to allow everyone to mount your file system or not. I know some people don't. We only admin our own boxes and no one else does, so it works best this way for us given the ideal scenario. On Mon, Oct 6, 2014 at 11:17 AM, Bryan Banister wrote: > There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang. > > Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment. Hope that helps, > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley > Sent: Friday, October 03, 2014 12:19 PM > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] filesets and mountpoint naming > > Resent: First copy sent Sept 23. Maybe stuck in a moderation queue? > > When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: > > /home > /scratch > /projects > /reference > /applications > > We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). > > We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. > > We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. > > Some thoughts are to just do filesystems as: > > /gpfs01, /gpfs02, etc. > /mnt/gpfs01, etc > /mnt/clustera/gpfs01, etc. > > What have other people done? Are you happy with it? What would you do differently? > > Thanks, > Stuart > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com From oehmes at gmail.com Mon Oct 6 17:27:58 2014 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 6 Oct 2014 09:27:58 -0700 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: > Just an FYI to the GPFS user community, > > > > We have been testing out GPFS AFM file systems in our required process of > file data migration between two GPFS file systems. The two GPFS file > systems are managed in two separate GPFS clusters. We have a third GPFS > cluster for compute systems. We created new independent AFM filesets in > the new GPFS file system that are linked to directories in the old file > system. Unfortunately access to the AFM filesets from the compute cluster > completely hang. Access to the other parts of the second file system is > fine. This limitation/issue is not documented in the Advanced Admin Guide. > > > > Further, we performed prefetch operations using a file mmafmctl command, > but the process appears to be single threaded and the operation was > extremely slow as a result. According to the Advanced Admin Guide, it is > not possible to run multiple prefetch jobs on the same fileset: > > GPFS can prefetch the data using the *mmafmctl **Device **prefetch ?j **FilesetName > *command (which specifies > > a list of files to prefetch). Note the following about prefetching: > > v It can be run in parallel on multiple filesets (although more than one > prefetching job cannot be run in > > parallel on a single fileset). > > > > We were able to quickly create the ?--home-inode-file? from the old file > system using the mmapplypolicy command as the documentation describes. > However the AFM prefetch operation is so slow that we are better off > running parallel rsync operations between the file systems versus using the > GPFS AFM prefetch operation. > > > > Cheers, > > -Bryan > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Mon Oct 6 17:30:02 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Mon, 6 Oct 2014 16:30:02 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister > wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kgunda at in.ibm.com Tue Oct 7 06:03:07 2014 From: kgunda at in.ibm.com (Kalyan Gunda) Date: Tue, 7 Oct 2014 10:33:07 +0530 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Tue Oct 7 15:44:48 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 7 Oct 2014 14:44:48 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> Interesting that AFM is supposed to work in a multi-cluster environment. We were using GPFS on the backend. The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount. The "gateway" nodes in the new cluster mounted the old file system. All systems were connected over the same QDR IB fabric. The client compute nodes in the third cluster mounted both the old and new file systems. I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed. I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable. There were no error messages in any GPFS or system logs related to the `ls` fail. NFS clients accessing cNFS servers in the new cluster also worked as expected. The `ls` from the NFS client in an AFM fileset returned the expected directory listing. Thus all symptoms indicated the configuration wasn't supported. I may try to replicate the problem in a test environment at some point. However AFM isn't really a great solution for file data migration between file systems for these reasons: 1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data). No way to have it simply keep the two namespaces in sync. And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system. This is concerning and it is not easy to detect that this event has occurred. 2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME! And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES! 3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration. Multiple threads from a single node just doesn't cut it due to single node bandwidth limits. When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second. The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list! 4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration. Yes, there is of course efforts to breakup the namespace for each rsync operations. This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired. AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN. Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 12:03 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From kgunda at in.ibm.com Tue Oct 7 16:20:30 2014 From: kgunda at in.ibm.com (Kalyan Gunda) Date: Tue, 7 Oct 2014 20:50:30 +0530 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: some clarifications inline: Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/07/2014 08:12 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org Interesting that AFM is supposed to work in a multi-cluster environment. We were using GPFS on the backend. The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount. The "gateway" nodes in the new cluster mounted the old file system. All systems were connected over the same QDR IB fabric. The client compute nodes in the third cluster mounted both the old and new file systems. I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed. I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable. There were no error messages in any GPFS or system logs related to the `ls` fail. NFS clients accessing cNFS servers in the new cluster also worked as expected. The `ls` from the NFS client in an AFM fileset returned the expected directory listing. Thus all symptoms indicated the configuration wasn't supported. I may try to replicate the problem in a test environment at some point. However AFM isn't really a great solution for file data migration between file systems for these reasons: 1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data). No way to have it simply keep the two namespaces in sync. And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system. This is concerning and it is not easy to detect that this event has occurred. --> The LU mode is meant for scenarios where changes in cache are not meant to be pushed back to old filesystem. If thats not whats desired then other AFM modes like IW can be used to keep namespace in sync and data can flow from both sides. Typically, for data migration --metadata-only to pull in the full namespace first and data can be migrated on demand or via policy as outlined above using prefetch cmd. AFM setup should be extension to GPFS multi-cluster setup when using GPFS backend. 2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME! And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES! --> I am not sure I follow the first downtime. If applications have to start using the new filesystem, then they have to be informed accordingly. If this can be done without bringing down applications, then there is no DOWNTIME. Regarding, second downtime, you are right, disabling AFM after data migration requires unlink and hence downtime. But there is a easy workaround, where revalidation intervals can be increased to max or GW nodes can be unconfigured without downtime with same effect. And disabling AFM can be done at a later point during maintenance window. We plan to modify this to have this done online aka without requiring unlink of the fileset. This will get prioritized if there is enough interest in AFM being used in this direction. 3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration. Multiple threads from a single node just doesn't cut it due to single node bandwidth limits. When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second. The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list! --> Prefetch can run on multiple nodes by configuring multiple GW nodes and enabling parallel i/o as specified in the docs..link provided below. Infact it can parallelize data xfer to a single file and also do multiple files in parallel depending on filesizes and various tuning params. 4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration. Yes, there is of course efforts to breakup the namespace for each rsync operations. This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired. --> AFM can be used for data migration without any downtime dictated by AFM (see above) and it can infact use multiple threads on multiple nodes to do parallel i/o. AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN. Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 12:03 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sdinardo at ebi.ac.uk Thu Oct 9 13:02:44 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Thu, 09 Oct 2014 13:02:44 +0100 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? Message-ID: <54367964.1050900@ebi.ac.uk> Hello everyone, Suppose we want to build a new GPFS storage using SAN attached storages, but instead to put metadata in a shared storage, we want to use FusionIO PCI cards locally on the servers to speed up metadata operation( http://www.fusionio.com/products/iodrive) and for reliability, replicate the metadata in all the servers, will this work in case of server failure? To make it more clear: If a server fail i will loose also a metadata vdisk. Its the replica mechanism its reliable enough to avoid metadata corruption and loss of data? Thanks in advance Salvatore Di Nardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Oct 9 20:31:28 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 9 Oct 2014 19:31:28 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> Just wanted to pass my GPFS RFE along: http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 Description: GPFS File System Manager should provide the option to log all file and directory operations that occur in a file system, preferably stored in a TSD (Time Series Database) that could be quickly queried through an API interface and command line tools. This would allow many required file system management operations to obtain the change log of a file system namespace without having to use the GPFS ILM policy engine to search all file system metadata for changes, and would not need to run massive differential comparisons of file system namespace snapshots to determine what files have been modified, deleted, added, etc. It would be doubly great if this could be controlled on a per-fileset bases. Use case: This could be used for a very large number of file system management applications, including: 1) SOBAR (Scale-Out Backup And Restore) 2) Data Security Auditing and Monitoring applications 3) Async Replication of namespace between GPFS file systems without the requirement of AFM, which must use ILM policies that add unnecessary workload to metadata resources. 4) Application file system access profiling Please vote for it if you feel it would also benefit your operation, thanks, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From service at metamodul.com Fri Oct 10 13:21:43 2014 From: service at metamodul.com (service at metamodul.com) Date: Fri, 10 Oct 2014 14:21:43 +0200 (CEST) Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <937639307.291563.1412943703119.JavaMail.open-xchange@oxbaltgw12.schlund.de> > Bryan Banister hat am 9. Oktober 2014 um 21:31 > geschrieben: > > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 > > I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgp at psu.edu Fri Oct 10 16:04:02 2014 From: pgp at psu.edu (Phil Pishioneri) Date: Fri, 10 Oct 2014 11:04:02 -0400 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <5437F562.1080609@psu.edu> On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil From bbanister at jumptrading.com Fri Oct 10 16:08:04 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Oct 2014 15:08:04 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <5437F562.1080609@psu.edu> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From bdeluca at gmail.com Fri Oct 10 16:26:40 2014 From: bdeluca at gmail.com (Ben De Luca) Date: Fri, 10 Oct 2014 23:26:40 +0800 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister wrote: > Hmm... I didn't think to use the DMAPI interface. That could be a nice > option. Has anybody done this already and are there any examples we could > look at? > > Thanks! > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto: > gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri > Sent: Friday, October 10, 2014 10:04 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS RFE promotion > > On 10/9/14 3:31 PM, Bryan Banister wrote: > > > > Just wanted to pass my GPFS RFE along: > > > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > > 0458 > > > > > > *Description*: > > > > GPFS File System Manager should provide the option to log all file and > > directory operations that occur in a file system, preferably stored in > > a TSD (Time Series Database) that could be quickly queried through an > > API interface and command line tools. ... > > > > The rudimentaries for this already exist via the DMAPI interface in GPFS > (used by the TSM HSM product). A while ago this was posted to the IBM GPFS > DeveloperWorks forum: > > On 1/3/11 10:27 AM, dWForums wrote: > > Author: > > AlokK.Dhir > > > > Message: > > We have a proof of concept which uses DMAPI to listens to and passively > logs filesystem changes with a non blocking listener. This log can be used > to generate backup sets etc. Unfortunately, a bug in the current DMAPI > keeps this approach from working in the case of certain events. I am told > 3.4.0.3 may contain a fix. We will gladly share the code once it is > working. > > -Phil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Fri Oct 10 16:51:51 2014 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 10 Oct 2014 08:51:51 -0700 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca wrote: > Id like this to see hot files > > On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister < > bbanister at jumptrading.com> wrote: > >> Hmm... I didn't think to use the DMAPI interface. That could be a nice >> option. Has anybody done this already and are there any examples we could >> look at? >> >> Thanks! >> -Bryan >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org [mailto: >> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri >> Sent: Friday, October 10, 2014 10:04 AM >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] GPFS RFE promotion >> >> On 10/9/14 3:31 PM, Bryan Banister wrote: >> > >> > Just wanted to pass my GPFS RFE along: >> > >> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 >> > 0458 >> > >> > >> > *Description*: >> > >> > GPFS File System Manager should provide the option to log all file and >> > directory operations that occur in a file system, preferably stored in >> > a TSD (Time Series Database) that could be quickly queried through an >> > API interface and command line tools. ... >> > >> >> The rudimentaries for this already exist via the DMAPI interface in GPFS >> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS >> DeveloperWorks forum: >> >> On 1/3/11 10:27 AM, dWForums wrote: >> > Author: >> > AlokK.Dhir >> > >> > Message: >> > We have a proof of concept which uses DMAPI to listens to and passively >> logs filesystem changes with a non blocking listener. This log can be used >> to generate backup sets etc. Unfortunately, a bug in the current DMAPI >> keeps this approach from working in the case of certain events. I am told >> 3.4.0.3 may contain a fix. We will gladly share the code once it is >> working. >> >> -Phil >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) >> only and may contain proprietary, confidential or privileged information. >> If you are not the intended recipient, you are hereby notified that any >> review, dissemination or copying of this email is strictly prohibited, and >> to please notify the sender immediately and destroy this email and any >> attachments. Email transmission cannot be guaranteed to be secure or >> error-free. The Company, therefore, does not make any guarantees as to the >> completeness or accuracy of this email or any attachments. This email is >> for informational purposes only and does not constitute a recommendation, >> offer, request or solicitation of any kind to buy, sell, subscribe, redeem >> or perform any type of transaction of a financial product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Fri Oct 10 17:02:09 2014 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 10 Oct 2014 16:02:09 +0000 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? In-Reply-To: <54367964.1050900@ebi.ac.uk> References: <54367964.1050900@ebi.ac.uk> Message-ID: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> Hi Salvatore, We've done this before (non-shared metadata NSDs with GPFS 4.1) and noted these constraints: * Filesystem descriptor quorum: since it will be easier to have a metadata disk go offline, it's even more important to have three failure groups with FusionIO metadata NSDs in two, and at least a desc_only NSD in the third one. You may even want to explore having three full metadata replicas on FusionIO. (Or perhaps if your workload can tolerate it the third one can be slower but in another GPFS "subnet" so that it isn't used for reads.) * Make sure to set the correct default metadata replicas in your filesystem, corresponding to the number of metadata failure groups you set up. When a metadata server goes offline, it will take the metadata disks with it, and you want a replica of the metadata to be available. * When a metadata server goes offline and comes back up (after a maintenance reboot, for example), the non-shared metadata disks will be stopped. Until those are brought back into a well-known replicated state, you are at risk of a cluster-wide filesystem unmount if there is a subsequent metadata disk failure. But GPFS will continue to work, by default, allowing reads and writes against the remaining metadata replica. You must detect that disks are stopped (e.g. mmlsdisk) and restart them (e.g. with mmchdisk start ?a). I haven't seen anyone "recommend" running non-shared disk like this, and I wouldn't do this for things which can't afford to go offline unexpectedly and require a little more operational attention. But it does appear to work. Thx Paul Sanchez From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Salvatore Di Nardo Sent: Thursday, October 09, 2014 8:03 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? Hello everyone, Suppose we want to build a new GPFS storage using SAN attached storages, but instead to put metadata in a shared storage, we want to use FusionIO PCI cards locally on the servers to speed up metadata operation( http://www.fusionio.com/products/iodrive) and for reliability, replicate the metadata in all the servers, will this work in case of server failure? To make it more clear: If a server fail i will loose also a metadata vdisk. Its the replica mechanism its reliable enough to avoid metadata corruption and loss of data? Thanks in advance Salvatore Di Nardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Oct 10 17:05:03 2014 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 10 Oct 2014 11:05:03 -0500 Subject: [gpfsug-discuss] GPFS File Heat Message-ID: As Sven suggests, this is easy to gather once you turn on file heat. I run this heat.pol file against a file systems to gather the values: -- heat.pol -- define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) END]) rule fh1 external list 'fh' exec '' rule fh2 list 'fh' weight(FILE_HEAT) show( DISPLAY_NULL(FILE_HEAT) || '|' || varchar(file_size) ) -- heat.pol -- Produces output similar to this: /gpfs/.../specFile.pyc 535089836 5892 /gpfs/.../syspath.py 528685287 806 /gpfs/---/bwe.py 528160670 4607 Actual GPFS file path redacted :) After that it's a relatively straightforward process to go thru the values. There is no documentation on what the values really mean, but it does give you some overall indication of which files are getting the most hits. I have other information to share; drop me a note at my work email: robert.oesterlin at nuance.com Bob Oesterlin Sr Storage Engineer, Nuance Communications -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdeluca at gmail.com Fri Oct 10 17:09:49 2014 From: bdeluca at gmail.com (Ben De Luca) Date: Sat, 11 Oct 2014 00:09:49 +0800 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme wrote: > Ben, > > to get lists of 'Hot Files' turn File Heat on , some discussion about it > is here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 > > thx. Sven > > > On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca wrote: > >> Id like this to see hot files >> >> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister < >> bbanister at jumptrading.com> wrote: >> >>> Hmm... I didn't think to use the DMAPI interface. That could be a nice >>> option. Has anybody done this already and are there any examples we could >>> look at? >>> >>> Thanks! >>> -Bryan >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at gpfsug.org [mailto: >>> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri >>> Sent: Friday, October 10, 2014 10:04 AM >>> To: gpfsug main discussion list >>> Subject: Re: [gpfsug-discuss] GPFS RFE promotion >>> >>> On 10/9/14 3:31 PM, Bryan Banister wrote: >>> > >>> > Just wanted to pass my GPFS RFE along: >>> > >>> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 >>> > 0458 >>> > >>> > >>> > *Description*: >>> > >>> > GPFS File System Manager should provide the option to log all file and >>> > directory operations that occur in a file system, preferably stored in >>> > a TSD (Time Series Database) that could be quickly queried through an >>> > API interface and command line tools. ... >>> > >>> >>> The rudimentaries for this already exist via the DMAPI interface in GPFS >>> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS >>> DeveloperWorks forum: >>> >>> On 1/3/11 10:27 AM, dWForums wrote: >>> > Author: >>> > AlokK.Dhir >>> > >>> > Message: >>> > We have a proof of concept which uses DMAPI to listens to and >>> passively logs filesystem changes with a non blocking listener. This log >>> can be used to generate backup sets etc. Unfortunately, a bug in the >>> current DMAPI keeps this approach from working in the case of certain >>> events. I am told 3.4.0.3 may contain a fix. We will gladly share the >>> code once it is working. >>> >>> -Phil >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) >>> only and may contain proprietary, confidential or privileged information. >>> If you are not the intended recipient, you are hereby notified that any >>> review, dissemination or copying of this email is strictly prohibited, and >>> to please notify the sender immediately and destroy this email and any >>> attachments. Email transmission cannot be guaranteed to be secure or >>> error-free. The Company, therefore, does not make any guarantees as to the >>> completeness or accuracy of this email or any attachments. This email is >>> for informational purposes only and does not constitute a recommendation, >>> offer, request or solicitation of any kind to buy, sell, subscribe, redeem >>> or perform any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 10 17:15:22 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Oct 2014 16:15:22 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> I agree with Ben, I think. I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Fri Oct 10 17:24:32 2014 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 10 Oct 2014 16:24:32 +0000 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <201D6001C896B846A9CFC2E841986AC1451878D2@mailnycmb2a.winmail.deshaw.com> We've been mounting all filesystems in a canonical location and bind mounting filesets into the namespace. One gotcha that we recently encountered though was the selection of /gpfs as the root of the canonical mount path. (By default automountdir is set to /gpfs/automountdir, which made this seem like a good spot.) This seems to be where gpfs expects filesystems to be mounted, since there are some hardcoded references in the gpfs.base RPM %pre script (RHEL package for GPFS) which try to nudge processes off of the filesystems before yanking the mounts during an RPM version upgrade. This however may take an exceedingly long time, since it's doing an 'lsof +D /gpfs' which walks the filesystems. -Paul Sanchez -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley Sent: Tuesday, September 23, 2014 11:47 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] filesets and mountpoint naming When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Fri Oct 10 17:52:27 2014 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 10 Oct 2014 09:52:27 -0700 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister wrote: > I agree with Ben, I think. > > > > I don?t want to use the ILM policy engine as that puts a direct workload > against the metadata storage and server resources. We need something > out-of-band, out of the file system operational path. > > > > Is there a simple DMAPI daemon that would log the file system namespace > changes that we could use? > > > > If so are there any limitations? > > > > And is it possible to set this up in an HA environment? > > > > Thanks! > > -Bryan > > > > *From:* gpfsug-discuss-bounces at gpfsug.org [mailto: > gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Ben De Luca > *Sent:* Friday, October 10, 2014 11:10 AM > > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > > > querying this through the policy engine is far to late to do any thing > useful with it > > > > On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme wrote: > > Ben, > > > > to get lists of 'Hot Files' turn File Heat on , some discussion about it > is here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 > > > > thx. Sven > > > > > > On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca wrote: > > Id like this to see hot files > > > > On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister < > bbanister at jumptrading.com> wrote: > > Hmm... I didn't think to use the DMAPI interface. That could be a nice > option. Has anybody done this already and are there any examples we could > look at? > > Thanks! > -Bryan > > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto: > gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri > Sent: Friday, October 10, 2014 10:04 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS RFE promotion > > On 10/9/14 3:31 PM, Bryan Banister wrote: > > > > Just wanted to pass my GPFS RFE along: > > > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > > 0458 > > > > > > *Description*: > > > > GPFS File System Manager should provide the option to log all file and > > directory operations that occur in a file system, preferably stored in > > a TSD (Time Series Database) that could be quickly queried through an > > API interface and command line tools. ... > > > > The rudimentaries for this already exist via the DMAPI interface in GPFS > (used by the TSM HSM product). A while ago this was posted to the IBM GPFS > DeveloperWorks forum: > > On 1/3/11 10:27 AM, dWForums wrote: > > Author: > > AlokK.Dhir > > > > Message: > > We have a proof of concept which uses DMAPI to listens to and passively > logs filesystem changes with a non blocking listener. This log can be used > to generate backup sets etc. Unfortunately, a bug in the current DMAPI > keeps this approach from working in the case of certain events. I am told > 3.4.0.3 may contain a fix. We will gladly share the code once it is > working. > > -Phil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 10 18:13:16 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Oct 2014 17:13:16 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted. I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable. I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!) Thanks, -Bryan Ps. Hajo said that he couldn?t access the RFE to vote on it: I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo Here is what the RFE website states: Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 A unique URL that you can bookmark and share with others. From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Friday, October 10, 2014 11:52 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > wrote: I agree with Ben, I think. I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Sat Oct 11 10:37:10 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Sat, 11 Oct 2014 10:37:10 +0100 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> References: <54367964.1050900@ebi.ac.uk> <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> Message-ID: <5438FA46.7090902@ebi.ac.uk> Thanks for your answer. Yes, the idea is to have 3 servers in 3 different failure groups. Each of them with a drive and set 3 metadata replica as the default one. I have not considered that the vdisks could be off after a 'reboot' or failure, so that's a good point, but anyway , after a failure or even a standard reboot, the server and the cluster have to be checked anyway, and i always check the vdisk status, so no big deal. Your answer made me consider also another thing... Once put them back online, they will be restriped automatically or should i run every time 'mmrestripefs' to verify/correct the replicas? I understand that use lodal disk sound strange, infact our first idea was just to add some ssd to the shared storage, but then we considered that the sas cable could be a huge bottleneck. The cost difference is not huge and the fusioio locally on the server would make the metadata just fly. On 10/10/14 17:02, Sanchez, Paul wrote: > > Hi Salvatore, > > We've done this before (non-shared metadata NSDs with GPFS 4.1) and > noted these constraints: > > * Filesystem descriptor quorum: since it will be easier to have a > metadata disk go offline, it's even more important to have three > failure groups with FusionIO metadata NSDs in two, and at least a > desc_only NSD in the third one. You may even want to explore having > three full metadata replicas on FusionIO. (Or perhaps if your workload > can tolerate it the third one can be slower but in another GPFS > "subnet" so that it isn't used for reads.) > > * Make sure to set the correct default metadata replicas in your > filesystem, corresponding to the number of metadata failure groups you > set up. When a metadata server goes offline, it will take the metadata > disks with it, and you want a replica of the metadata to be available. > > * When a metadata server goes offline and comes back up (after a > maintenance reboot, for example), the non-shared metadata disks will > be stopped. Until those are brought back into a well-known replicated > state, you are at risk of a cluster-wide filesystem unmount if there > is a subsequent metadata disk failure. But GPFS will continue to work, > by default, allowing reads and writes against the remaining metadata > replica. You must detect that disks are stopped (e.g. mmlsdisk) and > restart them (e.g. with mmchdisk start ?a). > > I haven't seen anyone "recommend" running non-shared disk like this, > and I wouldn't do this for things which can't afford to go offline > unexpectedly and require a little more operational attention. But it > does appear to work. > > Thx > Paul Sanchez > > *From:*gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di > Nardo > *Sent:* Thursday, October 09, 2014 8:03 AM > *To:* gpfsug main discussion list > *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable? > > Hello everyone, > > Suppose we want to build a new GPFS storage using SAN attached > storages, but instead to put metadata in a shared storage, we want to > use FusionIO PCI cards locally on the servers to speed up metadata > operation( http://www.fusionio.com/products/iodrive) and for > reliability, replicate the metadata in all the servers, will this work > in case of server failure? > > To make it more clear: If a server fail i will loose also a metadata > vdisk. Its the replica mechanism its reliable enough to avoid metadata > corruption and loss of data? > > Thanks in advance > Salvatore Di Nardo > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From service at metamodul.com Sun Oct 12 17:03:56 2014 From: service at metamodul.com (MetaService) Date: Sun, 12 Oct 2014 18:03:56 +0200 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <1413129836.4846.9.camel@titan> My preferred naming convention is to use the cluster name or part of it as the base directory for all GPFS mounts. Example: Clustername=c1_eum would mean that: /c1_eum/ would be the base directory for all Cluster c1_eum GPFSs In case a second local cluster would exist its root mount point would be /c2_eum/ Even in case of mounting remote clusters a naming collision is not very likely. BTW: For accessing the the final directories /.../scratch ... the user should not rely on the mount points but on given variables provided. CLS_HOME=/... CLS_SCRATCH=/.... hth Hajo From lhorrocks-barlow at ocf.co.uk Fri Oct 10 17:48:24 2014 From: lhorrocks-barlow at ocf.co.uk (Laurence Horrocks- Barlow) Date: Fri, 10 Oct 2014 17:48:24 +0100 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> References: <54367964.1050900@ebi.ac.uk> <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> Message-ID: <54380DD8.2020909@ocf.co.uk> Hi Salvatore, Just to add that when the local metadata disk fails or the server goes offline there will most likely be an I/O interruption/pause whist the GPFS cluster renegotiates. The main concept to be aware of (as Paul mentioned) is that when a disk goes offline it will appear down to GPFS, once you've started the disk again it will rediscover and scan the metadata for any missing updates, these updates are then repaired/replicated again. Laurence Horrocks-Barlow Linux Systems Software Engineer OCF plc Tel: +44 (0)114 257 2200 Fax: +44 (0)114 257 0022 Web: www.ocf.co.uk Blog: blog.ocf.co.uk Twitter: @ocfplc OCF plc is a company registered in England and Wales. Registered number 4132533, VAT number GB 780 6803 14. Registered office address: OCF plc, 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 2PG. This message is private and confidential. If you have received this message in error, please notify us and remove it from your system. On 10/10/2014 17:02, Sanchez, Paul wrote: > > Hi Salvatore, > > We've done this before (non-shared metadata NSDs with GPFS 4.1) and > noted these constraints: > > * Filesystem descriptor quorum: since it will be easier to have a > metadata disk go offline, it's even more important to have three > failure groups with FusionIO metadata NSDs in two, and at least a > desc_only NSD in the third one. You may even want to explore having > three full metadata replicas on FusionIO. (Or perhaps if your workload > can tolerate it the third one can be slower but in another GPFS > "subnet" so that it isn't used for reads.) > > * Make sure to set the correct default metadata replicas in your > filesystem, corresponding to the number of metadata failure groups you > set up. When a metadata server goes offline, it will take the metadata > disks with it, and you want a replica of the metadata to be available. > > * When a metadata server goes offline and comes back up (after a > maintenance reboot, for example), the non-shared metadata disks will > be stopped. Until those are brought back into a well-known replicated > state, you are at risk of a cluster-wide filesystem unmount if there > is a subsequent metadata disk failure. But GPFS will continue to work, > by default, allowing reads and writes against the remaining metadata > replica. You must detect that disks are stopped (e.g. mmlsdisk) and > restart them (e.g. with mmchdisk start ?a). > > I haven't seen anyone "recommend" running non-shared disk like this, > and I wouldn't do this for things which can't afford to go offline > unexpectedly and require a little more operational attention. But it > does appear to work. > > Thx > Paul Sanchez > > *From:*gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di > Nardo > *Sent:* Thursday, October 09, 2014 8:03 AM > *To:* gpfsug main discussion list > *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable? > > Hello everyone, > > Suppose we want to build a new GPFS storage using SAN attached > storages, but instead to put metadata in a shared storage, we want to > use FusionIO PCI cards locally on the servers to speed up metadata > operation( http://www.fusionio.com/products/iodrive) and for > reliability, replicate the metadata in all the servers, will this work > in case of server failure? > > To make it more clear: If a server fail i will loose also a metadata > vdisk. Its the replica mechanism its reliable enough to avoid metadata > corruption and loss of data? > > Thanks in advance > Salvatore Di Nardo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: lhorrocks-barlow.vcf Type: text/x-vcard Size: 388 bytes Desc: not available URL: From kraemerf at de.ibm.com Mon Oct 13 12:10:17 2014 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Mon, 13 Oct 2014 13:10:17 +0200 Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany Message-ID: GPFS at LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany Oct 14th 11:15-12:05 Room 18 http://sched.co/1uMYEWK Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany From service at metamodul.com Mon Oct 13 16:49:44 2014 From: service at metamodul.com (service at metamodul.com) Date: Mon, 13 Oct 2014 17:49:44 +0200 (CEST) Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany In-Reply-To: References: Message-ID: <994787708.574787.1413215384447.JavaMail.open-xchange@oxbaltgw12.schlund.de> Hallo Frank, the announcement is a little bit to late for me. Would be nice if you could share your speech later. cheers Hajo -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Oct 14 15:39:35 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Oct 2014 15:39:35 +0100 Subject: [gpfsug-discuss] wait for permission to append to log Message-ID: <543D35A7.7080800@ebi.ac.uk> hello all, could someone explain me the meaning of those waiters? gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' Does it means that the vdisk logs are struggling? Regards, Salvatore From oehmes at us.ibm.com Tue Oct 14 15:51:10 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 07:51:10 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: <543D35A7.7080800@ebi.ac.uk> References: <543D35A7.7080800@ebi.ac.uk> Message-ID: it means there is contention on inserting data into the fast write log on the GSS Node, which could be config or workload related what GSS code version are you running and how are the nodes connected with each other (Ethernet or IB) ? ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Salvatore Di Nardo To: gpfsug main discussion list Date: 10/14/2014 07:40 AM Subject: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org hello all, could someone explain me the meaning of those waiters? gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' Does it means that the vdisk logs are struggling? Regards, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Oct 14 16:23:01 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Oct 2014 16:23:01 +0100 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> Message-ID: <543D3FD5.1060705@ebi.ac.uk> On 14/10/14 15:51, Sven Oehme wrote: > it means there is contention on inserting data into the fast write log > on the GSS Node, which could be config or workload related > what GSS code version are you running [root at ebi5-251 ~]# mmdiag --version === mmdiag: version === Current GPFS build: "3.5.0-11 efix1 (888041)". Built on Jul 9 2013 at 18:03:32 Running 6 days 2 hours 10 minutes 35 secs > and how are the nodes connected with each other (Ethernet or IB) ? ethernet. they use the same bonding (4x10Gb/s) where the data is passing. We don't have admin dedicated network [root at gss03a ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager *Note:* The 3 node "pairs" (gss01, gss02 and gss03) are in different subnet because of datacenter constraints ( They are not physically in the same row, and due to network constraints was not possible to put them in the same subnet). The packets are routed, but should not be a problem as there is 160Gb/s bandwidth between them. Regards, Salvatore > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug main discussion list > Date: 10/14/2014 07:40 AM > Subject: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > ------------------------------------------------------------------------ > > > > hello all, > could someone explain me the meaning of those waiters? > > gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > > Does it means that the vdisk logs are struggling? > > Regards, > Salvatore > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Tue Oct 14 17:22:41 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 09:22:41 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: <543D3FD5.1060705@ebi.ac.uk> References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> Message-ID: your GSS code version is very backlevel. can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk as well as mmlsconfig and mmlsfs all thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Salvatore Di Nardo To: gpfsug-discuss at gpfsug.org Date: 10/14/2014 08:23 AM Subject: Re: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org On 14/10/14 15:51, Sven Oehme wrote: it means there is contention on inserting data into the fast write log on the GSS Node, which could be config or workload related what GSS code version are you running [root at ebi5-251 ~]# mmdiag --version === mmdiag: version === Current GPFS build: "3.5.0-11 efix1 (888041)". Built on Jul 9 2013 at 18:03:32 Running 6 days 2 hours 10 minutes 35 secs and how are the nodes connected with each other (Ethernet or IB) ? ethernet. they use the same bonding (4x10Gb/s) where the data is passing. We don't have admin dedicated network [root at gss03a ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different subnet because of datacenter constraints ( They are not physically in the same row, and due to network constraints was not possible to put them in the same subnet). The packets are routed, but should not be a problem as there is 160Gb/s bandwidth between them. Regards, Salvatore ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Salvatore Di Nardo To: gpfsug main discussion list Date: 10/14/2014 07:40 AM Subject: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org hello all, could someone explain me the meaning of those waiters? gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' Does it means that the vdisk logs are struggling? Regards, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Oct 14 17:39:18 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Oct 2014 17:39:18 +0100 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> Message-ID: <543D51B6.3070602@ebi.ac.uk> Thanks in advance for your help. We have 6 RG: recovery group vdisks vdisks servers ------------------ ----------- ------ ------- gss01a 4 8 gss01a.ebi.ac.uk,gss01b.ebi.ac.uk gss01b 4 8 gss01b.ebi.ac.uk,gss01a.ebi.ac.uk gss02a 4 8 gss02a.ebi.ac.uk,gss02b.ebi.ac.uk gss02b 4 8 gss02b.ebi.ac.uk,gss02a.ebi.ac.uk gss03a 4 8 gss03a.ebi.ac.uk,gss03b.ebi.ac.uk gss03b 4 8 gss03b.ebi.ac.uk,gss03a.ebi.ac.uk Check the attached file for RG details. Following mmlsconfig: [root at gss01a ~]# mmlsconfig Configuration data for cluster GSS.ebi.ac.uk: --------------------------------------------- myNodeConfigNumber 1 clusterName GSS.ebi.ac.uk clusterId 17987981184946329605 autoload no dmapiFileHandleSize 32 minReleaseLevel 3.5.0.11 [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] pagepool 38g nsdRAIDBufferPoolSizePct 80 maxBufferDescs 2m numaMemoryInterleave yes prefetchPct 5 maxblocksize 16m nsdRAIDTracks 128k ioHistorySize 64k nsdRAIDSmallBufferSize 256k nsdMaxWorkerThreads 3k nsdMinWorkerThreads 3k nsdRAIDSmallThreadRatio 2 nsdRAIDThreadsPerQueue 16 nsdClientCksumTypeLocal ck64 nsdClientCksumTypeRemote ck64 nsdRAIDEventLogToConsole all nsdRAIDFastWriteFSDataLimit 64k nsdRAIDFastWriteFSMetadataLimit 256k nsdRAIDReconstructAggressiveness 1 nsdRAIDFlusherBuffersLowWatermarkPct 20 nsdRAIDFlusherBuffersLimitPct 80 nsdRAIDFlusherTracksLowWatermarkPct 20 nsdRAIDFlusherTracksLimitPct 80 nsdRAIDFlusherFWLogHighWatermarkMB 1000 nsdRAIDFlusherFWLogLimitMB 5000 nsdRAIDFlusherThreadsLowWatermark 1 nsdRAIDFlusherThreadsHighWatermark 512 nsdRAIDBlockDeviceMaxSectorsKB 4096 nsdRAIDBlockDeviceNrRequests 32 nsdRAIDBlockDeviceQueueDepth 16 nsdRAIDBlockDeviceScheduler deadline nsdRAIDMaxTransientStale2FT 1 nsdRAIDMaxTransientStale3FT 1 syncWorkerThreads 256 tscWorkerPool 64 nsdInlineWriteMax 32k maxFilesToCache 12k maxStatCache 512 maxGeneralThreads 1280 flushedDataTarget 1024 flushedInodeTarget 1024 maxFileCleaners 1024 maxBufferCleaners 1024 logBufferCount 20 logWrapAmountPct 2 logWrapThreads 128 maxAllocRegionsPerNode 32 maxBackgroundDeletionThreads 16 maxInodeDeallocPrefetch 128 maxMBpS 16000 maxReceiverThreads 128 worker1Threads 1024 worker3Threads 32 [common] cipherList AUTHONLY socketMaxListenConnections 1500 failureDetectionTime 60 [common] adminMode central File systems in cluster GSS.ebi.ac.uk: -------------------------------------- /dev/gpfs1 For more configuration paramenters i also attached a file with the complete output of mmdiag --config. and mmlsfs: File system attributes for /dev/gpfs1: ====================================== flag value description ------------------- ------------------------ ----------------------------------- -f 32768 Minimum fragment size in bytes (system pool) 262144 Minimum fragment size in bytes (other pools) -i 512 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1000 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 8388608 Block size (other pools) -Q user;group;fileset Quotas enforced user;group;fileset Default quotas enabled --filesetdf no Fileset df enabled? -V 13.23 (3.5.0.7) File system version --create-time Tue Mar 18 16:01:24 2014 File system creation time -u yes Support for large LUNs? -z no Is DMAPI enabled? -L 4194304 Logfile size -E yes Exact mtime mount option -S yes Suppress atime mount option -K whenpossible Strict replica allocation option --fastea yes Fast external attributes enabled? --inode-limit 134217728 Maximum number of inodes -P system;data Disk storage pools in file system -d gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; -d gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; -d gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; -d gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; -d gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 Disks in file system --perfileset-quota no Per-fileset quota enforcement -A yes Automatic mount option -o none Additional mount options -T /gpfs1 Default mount point --mount-priority 0 Mount priority Regards, Salvatore On 14/10/14 17:22, Sven Oehme wrote: > your GSS code version is very backlevel. > > can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk > as well as mmlsconfig and mmlsfs all > > thx. Sven > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug-discuss at gpfsug.org > Date: 10/14/2014 08:23 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > ------------------------------------------------------------------------ > > > > > On 14/10/14 15:51, Sven Oehme wrote: > it means there is contention on inserting data into the fast write log > on the GSS Node, which could be config or workload related > what GSS code version are you running > [root at ebi5-251 ~]# mmdiag --version > > === mmdiag: version === > Current GPFS build: "3.5.0-11 efix1 (888041)". > Built on Jul 9 2013 at 18:03:32 > Running 6 days 2 hours 10 minutes 35 secs > > > > and how are the nodes connected with each other (Ethernet or IB) ? > ethernet. they use the same bonding (4x10Gb/s) where the data is > passing. We don't have admin dedicated network > > [root at gss03a ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager > > > *Note:* The 3 node "pairs" (gss01, gss02 and gss03) are in different > subnet because of datacenter constraints ( They are not physically in > the same row, and due to network constraints was not possible to put > them in the same subnet). The packets are routed, but should not be a > problem as there is 160Gb/s bandwidth between them. > > Regards, > Salvatore > > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: _oehmes at us.ibm.com_ > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo __ > > To: gpfsug main discussion list __ > > Date: 10/14/2014 07:40 AM > Subject: [gpfsug-discuss] wait for permission to append to log > Sent by: _gpfsug-discuss-bounces at gpfsug.org_ > > ------------------------------------------------------------------------ > > > > hello all, > could someone explain me the meaning of those waiters? > > gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > > Does it means that the vdisk logs are struggling? > > Regards, > Salvatore > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss01a 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 42% low DA3 no 2 58 2 1 786 GiB 14 days scrub 4% low DA2 no 2 58 2 1 786 GiB 14 days scrub 4% low DA1 no 3 58 2 1 626 GiB 14 days scrub 59% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s01 2 DA3 110 GiB ok e1d1s02 2 DA2 110 GiB ok e1d1s03log 2 LOG 186 GiB ok e1d1s04 2 DA1 108 GiB ok e1d1s05 2 DA2 110 GiB ok e1d1s06 2 DA3 110 GiB ok e1d2s01 2 DA1 108 GiB ok e1d2s02 2 DA2 110 GiB ok e1d2s03 2 DA3 110 GiB ok e1d2s04 2 DA1 108 GiB ok e1d2s05 2 DA2 110 GiB ok e1d2s06 2 DA3 110 GiB ok e1d3s01 2 DA1 108 GiB ok e1d3s02 2 DA2 110 GiB ok e1d3s03 2 DA3 110 GiB ok e1d3s04 2 DA1 108 GiB ok e1d3s05 2 DA2 110 GiB ok e1d3s06 2 DA3 110 GiB ok e1d4s01 2 DA1 108 GiB ok e1d4s02 2 DA2 110 GiB ok e1d4s03 2 DA3 110 GiB ok e1d4s04 2 DA1 108 GiB ok e1d4s05 2 DA2 110 GiB ok e1d4s06 2 DA3 110 GiB ok e1d5s01 2 DA1 108 GiB ok e1d5s02 2 DA2 110 GiB ok e1d5s03 2 DA3 110 GiB ok e1d5s04 2 DA1 108 GiB ok e1d5s05 2 DA2 110 GiB ok e1d5s06 2 DA3 110 GiB ok e2d1s01 2 DA3 110 GiB ok e2d1s02 2 DA2 110 GiB ok e2d1s03log 2 LOG 186 GiB ok e2d1s04 2 DA1 108 GiB ok e2d1s05 2 DA2 110 GiB ok e2d1s06 2 DA3 110 GiB ok e2d2s01 2 DA1 108 GiB ok e2d2s02 2 DA2 110 GiB ok e2d2s03 2 DA3 110 GiB ok e2d2s04 2 DA1 108 GiB ok e2d2s05 2 DA2 110 GiB ok e2d2s06 2 DA3 110 GiB ok e2d3s01 2 DA1 108 GiB ok e2d3s02 2 DA2 110 GiB ok e2d3s03 2 DA3 110 GiB ok e2d3s04 2 DA1 108 GiB ok e2d3s05 2 DA2 110 GiB ok e2d3s06 2 DA3 110 GiB ok e2d4s01 2 DA1 108 GiB ok e2d4s02 2 DA2 110 GiB ok e2d4s03 2 DA3 110 GiB ok e2d4s04 2 DA1 108 GiB ok e2d4s05 2 DA2 110 GiB ok e2d4s06 2 DA3 110 GiB ok e2d5s01 2 DA1 108 GiB ok e2d5s02 2 DA2 110 GiB ok e2d5s03 2 DA3 110 GiB ok e2d5s04 2 DA1 108 GiB ok e2d5s05 2 DA2 110 GiB ok e2d5s06 2 DA3 110 GiB ok e3d1s01 2 DA1 108 GiB ok e3d1s02 2 DA3 110 GiB ok e3d1s03log 2 LOG 186 GiB ok e3d1s04 2 DA1 108 GiB ok e3d1s05 2 DA2 110 GiB ok e3d1s06 2 DA3 110 GiB ok e3d2s01 2 DA1 108 GiB ok e3d2s02 2 DA2 110 GiB ok e3d2s03 2 DA3 110 GiB ok e3d2s04 2 DA1 108 GiB ok e3d2s05 2 DA2 110 GiB ok e3d2s06 2 DA3 110 GiB ok e3d3s01 2 DA1 108 GiB ok e3d3s02 2 DA2 110 GiB ok e3d3s03 2 DA3 110 GiB ok e3d3s04 2 DA1 108 GiB ok e3d3s05 2 DA2 110 GiB ok e3d3s06 2 DA3 110 GiB ok e3d4s01 2 DA1 108 GiB ok e3d4s02 2 DA2 110 GiB ok e3d4s03 2 DA3 110 GiB ok e3d4s04 2 DA1 108 GiB ok e3d4s05 2 DA2 110 GiB ok e3d4s06 2 DA3 110 GiB ok e3d5s01 2 DA1 108 GiB ok e3d5s02 2 DA2 110 GiB ok e3d5s03 2 DA3 110 GiB ok e3d5s04 2 DA1 108 GiB ok e3d5s05 2 DA2 110 GiB ok e3d5s06 2 DA3 110 GiB ok e4d1s01 2 DA1 108 GiB ok e4d1s02 2 DA3 110 GiB ok e4d1s04 2 DA1 108 GiB ok e4d1s05 2 DA2 110 GiB ok e4d1s06 2 DA3 110 GiB ok e4d2s01 2 DA1 108 GiB ok e4d2s02 2 DA2 110 GiB ok e4d2s03 2 DA3 110 GiB ok e4d2s04 2 DA1 106 GiB ok e4d2s05 2 DA2 110 GiB ok e4d2s06 2 DA3 110 GiB ok e4d3s01 2 DA1 106 GiB ok e4d3s02 2 DA2 110 GiB ok e4d3s03 2 DA3 110 GiB ok e4d3s04 2 DA1 106 GiB ok e4d3s05 2 DA2 110 GiB ok e4d3s06 2 DA3 110 GiB ok e4d4s01 2 DA1 106 GiB ok e4d4s02 2 DA2 110 GiB ok e4d4s03 2 DA3 110 GiB ok e4d4s04 2 DA1 106 GiB ok e4d4s05 2 DA2 110 GiB ok e4d4s06 2 DA3 110 GiB ok e4d5s01 2 DA1 106 GiB ok e4d5s02 2 DA2 110 GiB ok e4d5s03 2 DA3 110 GiB ok e4d5s04 2 DA1 106 GiB ok e4d5s05 2 DA2 110 GiB ok e4d5s06 2 DA3 110 GiB ok e5d1s01 2 DA1 106 GiB ok e5d1s02 2 DA2 110 GiB ok e5d1s04 2 DA1 106 GiB ok e5d1s05 2 DA2 110 GiB ok e5d1s06 2 DA3 110 GiB ok e5d2s01 2 DA1 106 GiB ok e5d2s02 2 DA2 110 GiB ok e5d2s03 2 DA3 110 GiB ok e5d2s04 2 DA1 106 GiB ok e5d2s05 2 DA2 110 GiB ok e5d2s06 2 DA3 110 GiB ok e5d3s01 2 DA1 106 GiB ok e5d3s02 2 DA2 110 GiB ok e5d3s03 2 DA3 110 GiB ok e5d3s04 2 DA1 106 GiB ok e5d3s05 2 DA2 110 GiB ok e5d3s06 2 DA3 110 GiB ok e5d4s01 2 DA1 106 GiB ok e5d4s02 2 DA2 110 GiB ok e5d4s03 2 DA3 110 GiB ok e5d4s04 2 DA1 106 GiB ok e5d4s05 2 DA2 110 GiB ok e5d4s06 2 DA3 110 GiB ok e5d5s01 2 DA1 106 GiB ok e5d5s02 2 DA2 110 GiB ok e5d5s03 2 DA3 110 GiB ok e5d5s04 2 DA1 106 GiB ok e5d5s05 2 DA2 110 GiB ok e5d5s06 2 DA3 110 GiB ok e6d1s01 2 DA1 106 GiB ok e6d1s02 2 DA2 110 GiB ok e6d1s04 2 DA1 106 GiB ok e6d1s05 2 DA2 110 GiB ok e6d1s06 2 DA3 110 GiB ok e6d2s01 2 DA1 106 GiB ok e6d2s02 2 DA2 110 GiB ok e6d2s03 2 DA3 110 GiB ok e6d2s04 2 DA1 106 GiB ok e6d2s05 2 DA2 110 GiB ok e6d2s06 2 DA3 110 GiB ok e6d3s01 2 DA1 106 GiB ok e6d3s02 2 DA2 110 GiB ok e6d3s03 2 DA3 110 GiB ok e6d3s04 2 DA1 106 GiB ok e6d3s05 2 DA2 108 GiB ok e6d3s06 2 DA3 108 GiB ok e6d4s01 2 DA1 106 GiB ok e6d4s02 2 DA2 108 GiB ok e6d4s03 2 DA3 108 GiB ok e6d4s04 2 DA1 106 GiB ok e6d4s05 2 DA2 108 GiB ok e6d4s06 2 DA3 108 GiB ok e6d5s01 2 DA1 106 GiB ok e6d5s02 2 DA2 108 GiB ok e6d5s03 2 DA3 108 GiB ok e6d5s04 2 DA1 106 GiB ok e6d5s05 2 DA2 108 GiB ok e6d5s06 2 DA3 108 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss01a_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss01a_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss01a_MetaData_8M_3p_1 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss01a_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss01a_MetaData_8M_3p_3 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss01a_Data_8M_3p_1 8+3p DA3 99 TiB 8 MiB 32 KiB gss01a_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss01a_Data_8M_3p_3 8+3p DA1 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss01a.ebi.ac.uk gss01a.ebi.ac.uk,gss01b.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss01b 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 36% low DA1 no 3 58 2 1 626 GiB 14 days scrub 61% low DA2 no 2 58 2 1 786 GiB 14 days scrub 68% low DA3 no 2 58 2 1 786 GiB 14 days scrub 70% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s07 2 DA1 108 GiB ok e1d1s08 2 DA2 110 GiB ok e1d1s09 2 DA3 110 GiB ok e1d1s10 2 DA1 108 GiB ok e1d1s11 2 DA2 110 GiB ok e1d1s12 2 DA3 110 GiB ok e1d2s07 2 DA1 108 GiB ok e1d2s08 2 DA2 110 GiB ok e1d2s09 2 DA3 110 GiB ok e1d2s10 2 DA1 108 GiB ok e1d2s11 2 DA2 110 GiB ok e1d2s12 2 DA3 110 GiB ok e1d3s07 2 DA1 108 GiB ok e1d3s08 2 DA2 110 GiB ok e1d3s09 2 DA3 110 GiB ok e1d3s10 2 DA1 108 GiB ok e1d3s11 2 DA2 110 GiB ok e1d3s12 2 DA3 110 GiB ok e1d4s07 2 DA1 108 GiB ok e1d4s08 2 DA2 110 GiB ok e1d4s09 2 DA3 110 GiB ok e1d4s10 2 DA1 108 GiB ok e1d4s11 2 DA2 110 GiB ok e1d4s12 2 DA3 110 GiB ok e1d5s07 2 DA1 108 GiB ok e1d5s08 2 DA2 110 GiB ok e1d5s09 2 DA3 110 GiB ok e1d5s10 2 DA3 110 GiB ok e1d5s11 2 DA2 110 GiB ok e1d5s12log 2 LOG 186 GiB ok e2d1s07 2 DA1 106 GiB ok e2d1s08 2 DA2 110 GiB ok e2d1s09 2 DA3 110 GiB ok e2d1s10 2 DA1 108 GiB ok e2d1s11 2 DA2 110 GiB ok e2d1s12 2 DA3 110 GiB ok e2d2s07 2 DA1 108 GiB ok e2d2s08 2 DA2 110 GiB ok e2d2s09 2 DA3 110 GiB ok e2d2s10 2 DA1 108 GiB ok e2d2s11 2 DA2 110 GiB ok e2d2s12 2 DA3 110 GiB ok e2d3s07 2 DA1 108 GiB ok e2d3s08 2 DA2 110 GiB ok e2d3s09 2 DA3 110 GiB ok e2d3s10 2 DA1 108 GiB ok e2d3s11 2 DA2 110 GiB ok e2d3s12 2 DA3 110 GiB ok e2d4s07 2 DA1 108 GiB ok e2d4s08 2 DA2 110 GiB ok e2d4s09 2 DA3 110 GiB ok e2d4s10 2 DA1 108 GiB ok e2d4s11 2 DA2 110 GiB ok e2d4s12 2 DA3 110 GiB ok e2d5s07 2 DA1 108 GiB ok e2d5s08 2 DA2 110 GiB ok e2d5s09 2 DA3 110 GiB ok e2d5s10 2 DA3 110 GiB ok e2d5s11 2 DA2 110 GiB ok e2d5s12log 2 LOG 186 GiB ok e3d1s07 2 DA1 108 GiB ok e3d1s08 2 DA2 110 GiB ok e3d1s09 2 DA3 110 GiB ok e3d1s10 2 DA1 108 GiB ok e3d1s11 2 DA2 110 GiB ok e3d1s12 2 DA3 110 GiB ok e3d2s07 2 DA1 108 GiB ok e3d2s08 2 DA2 110 GiB ok e3d2s09 2 DA3 110 GiB ok e3d2s10 2 DA1 108 GiB ok e3d2s11 2 DA2 110 GiB ok e3d2s12 2 DA3 110 GiB ok e3d3s07 2 DA1 108 GiB ok e3d3s08 2 DA2 110 GiB ok e3d3s09 2 DA3 110 GiB ok e3d3s10 2 DA1 108 GiB ok e3d3s11 2 DA2 110 GiB ok e3d3s12 2 DA3 110 GiB ok e3d4s07 2 DA1 108 GiB ok e3d4s08 2 DA2 110 GiB ok e3d4s09 2 DA3 110 GiB ok e3d4s10 2 DA1 108 GiB ok e3d4s11 2 DA2 110 GiB ok e3d4s12 2 DA3 110 GiB ok e3d5s07 2 DA1 108 GiB ok e3d5s08 2 DA2 110 GiB ok e3d5s09 2 DA3 110 GiB ok e3d5s10 2 DA1 108 GiB ok e3d5s11 2 DA3 110 GiB ok e3d5s12log 2 LOG 186 GiB ok e4d1s07 2 DA1 108 GiB ok e4d1s08 2 DA2 110 GiB ok e4d1s09 2 DA3 110 GiB ok e4d1s10 2 DA1 108 GiB ok e4d1s11 2 DA2 110 GiB ok e4d1s12 2 DA3 110 GiB ok e4d2s07 2 DA1 108 GiB ok e4d2s08 2 DA2 110 GiB ok e4d2s09 2 DA3 110 GiB ok e4d2s10 2 DA1 108 GiB ok e4d2s11 2 DA2 110 GiB ok e4d2s12 2 DA3 110 GiB ok e4d3s07 2 DA1 106 GiB ok e4d3s08 2 DA2 110 GiB ok e4d3s09 2 DA3 110 GiB ok e4d3s10 2 DA1 106 GiB ok e4d3s11 2 DA2 110 GiB ok e4d3s12 2 DA3 110 GiB ok e4d4s07 2 DA1 106 GiB ok e4d4s08 2 DA2 110 GiB ok e4d4s09 2 DA3 110 GiB ok e4d4s10 2 DA1 106 GiB ok e4d4s11 2 DA2 110 GiB ok e4d4s12 2 DA3 110 GiB ok e4d5s07 2 DA1 106 GiB ok e4d5s08 2 DA2 110 GiB ok e4d5s09 2 DA3 110 GiB ok e4d5s10 2 DA1 106 GiB ok e4d5s11 2 DA3 110 GiB ok e5d1s07 2 DA1 106 GiB ok e5d1s08 2 DA2 110 GiB ok e5d1s09 2 DA3 110 GiB ok e5d1s10 2 DA1 106 GiB ok e5d1s11 2 DA2 110 GiB ok e5d1s12 2 DA3 110 GiB ok e5d2s07 2 DA1 106 GiB ok e5d2s08 2 DA2 110 GiB ok e5d2s09 2 DA3 110 GiB ok e5d2s10 2 DA1 106 GiB ok e5d2s11 2 DA2 110 GiB ok e5d2s12 2 DA3 110 GiB ok e5d3s07 2 DA1 106 GiB ok e5d3s08 2 DA2 110 GiB ok e5d3s09 2 DA3 110 GiB ok e5d3s10 2 DA1 106 GiB ok e5d3s11 2 DA2 110 GiB ok e5d3s12 2 DA3 108 GiB ok e5d4s07 2 DA1 106 GiB ok e5d4s08 2 DA2 110 GiB ok e5d4s09 2 DA3 110 GiB ok e5d4s10 2 DA1 106 GiB ok e5d4s11 2 DA2 110 GiB ok e5d4s12 2 DA3 110 GiB ok e5d5s07 2 DA1 106 GiB ok e5d5s08 2 DA2 110 GiB ok e5d5s09 2 DA3 110 GiB ok e5d5s10 2 DA1 106 GiB ok e5d5s11 2 DA2 110 GiB ok e6d1s07 2 DA1 106 GiB ok e6d1s08 2 DA2 110 GiB ok e6d1s09 2 DA3 110 GiB ok e6d1s10 2 DA1 106 GiB ok e6d1s11 2 DA2 110 GiB ok e6d1s12 2 DA3 110 GiB ok e6d2s07 2 DA1 106 GiB ok e6d2s08 2 DA2 110 GiB ok e6d2s09 2 DA3 110 GiB ok e6d2s10 2 DA1 106 GiB ok e6d2s11 2 DA2 110 GiB ok e6d2s12 2 DA3 110 GiB ok e6d3s07 2 DA1 106 GiB ok e6d3s08 2 DA2 108 GiB ok e6d3s09 2 DA3 110 GiB ok e6d3s10 2 DA1 106 GiB ok e6d3s11 2 DA2 108 GiB ok e6d3s12 2 DA3 108 GiB ok e6d4s07 2 DA1 106 GiB ok e6d4s08 2 DA2 108 GiB ok e6d4s09 2 DA3 108 GiB ok e6d4s10 2 DA1 106 GiB ok e6d4s11 2 DA2 108 GiB ok e6d4s12 2 DA3 108 GiB ok e6d5s07 2 DA1 106 GiB ok e6d5s08 2 DA2 110 GiB ok e6d5s09 2 DA3 108 GiB ok e6d5s10 2 DA1 106 GiB ok e6d5s11 2 DA2 108 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss01b_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss01b_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss01b_MetaData_8M_3p_1 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss01b_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss01b_MetaData_8M_3p_3 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss01b_Data_8M_3p_1 8+3p DA1 99 TiB 8 MiB 32 KiB gss01b_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss01b_Data_8M_3p_3 8+3p DA3 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss01b.ebi.ac.uk gss01b.ebi.ac.uk,gss01a.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss02a 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 41% low DA3 no 2 58 2 1 786 GiB 14 days scrub 8% low DA2 no 2 58 2 1 786 GiB 14 days scrub 14% low DA1 no 3 58 2 1 626 GiB 14 days scrub 5% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s01 2 DA3 110 GiB ok e1d1s02 2 DA2 110 GiB ok e1d1s03log 2 LOG 186 GiB ok e1d1s04 2 DA1 108 GiB ok e1d1s05 2 DA2 110 GiB ok e1d1s06 2 DA3 110 GiB ok e1d2s01 2 DA1 108 GiB ok e1d2s02 2 DA2 110 GiB ok e1d2s03 2 DA3 110 GiB ok e1d2s04 2 DA1 108 GiB ok e1d2s05 2 DA2 110 GiB ok e1d2s06 2 DA3 110 GiB ok e1d3s01 2 DA1 108 GiB ok e1d3s02 2 DA2 110 GiB ok e1d3s03 2 DA3 110 GiB ok e1d3s04 2 DA1 108 GiB ok e1d3s05 2 DA2 110 GiB ok e1d3s06 2 DA3 110 GiB ok e1d4s01 2 DA1 108 GiB ok e1d4s02 2 DA2 110 GiB ok e1d4s03 2 DA3 110 GiB ok e1d4s04 2 DA1 108 GiB ok e1d4s05 2 DA2 110 GiB ok e1d4s06 2 DA3 110 GiB ok e1d5s01 2 DA1 108 GiB ok e1d5s02 2 DA2 110 GiB ok e1d5s03 2 DA3 110 GiB ok e1d5s04 2 DA1 108 GiB ok e1d5s05 2 DA2 110 GiB ok e1d5s06 2 DA3 110 GiB ok e2d1s01 2 DA3 110 GiB ok e2d1s02 2 DA2 110 GiB ok e2d1s03log 2 LOG 186 GiB ok e2d1s04 2 DA1 108 GiB ok e2d1s05 2 DA2 110 GiB ok e2d1s06 2 DA3 110 GiB ok e2d2s01 2 DA1 106 GiB ok e2d2s02 2 DA2 110 GiB ok e2d2s03 2 DA3 110 GiB ok e2d2s04 2 DA1 106 GiB ok e2d2s05 2 DA2 110 GiB ok e2d2s06 2 DA3 110 GiB ok e2d3s01 2 DA1 106 GiB ok e2d3s02 2 DA2 110 GiB ok e2d3s03 2 DA3 110 GiB ok e2d3s04 2 DA1 106 GiB ok e2d3s05 2 DA2 110 GiB ok e2d3s06 2 DA3 110 GiB ok e2d4s01 2 DA1 106 GiB ok e2d4s02 2 DA2 110 GiB ok e2d4s03 2 DA3 110 GiB ok e2d4s04 2 DA1 106 GiB ok e2d4s05 2 DA2 110 GiB ok e2d4s06 2 DA3 110 GiB ok e2d5s01 2 DA1 108 GiB ok e2d5s02 2 DA2 110 GiB ok e2d5s03 2 DA3 110 GiB ok e2d5s04 2 DA1 108 GiB ok e2d5s05 2 DA2 110 GiB ok e2d5s06 2 DA3 110 GiB ok e3d1s01 2 DA1 108 GiB ok e3d1s02 2 DA3 110 GiB ok e3d1s03log 2 LOG 186 GiB ok e3d1s04 2 DA1 106 GiB ok e3d1s05 2 DA2 110 GiB ok e3d1s06 2 DA3 110 GiB ok e3d2s01 2 DA1 106 GiB ok e3d2s02 2 DA2 110 GiB ok e3d2s03 2 DA3 110 GiB ok e3d2s04 2 DA1 108 GiB ok e3d2s05 2 DA2 110 GiB ok e3d2s06 2 DA3 110 GiB ok e3d3s01 2 DA1 106 GiB ok e3d3s02 2 DA2 110 GiB ok e3d3s03 2 DA3 110 GiB ok e3d3s04 2 DA1 106 GiB ok e3d3s05 2 DA2 110 GiB ok e3d3s06 2 DA3 110 GiB ok e3d4s01 2 DA1 106 GiB ok e3d4s02 2 DA2 110 GiB ok e3d4s03 2 DA3 110 GiB ok e3d4s04 2 DA1 108 GiB ok e3d4s05 2 DA2 110 GiB ok e3d4s06 2 DA3 110 GiB ok e3d5s01 2 DA1 108 GiB ok e3d5s02 2 DA2 110 GiB ok e3d5s03 2 DA3 110 GiB ok e3d5s04 2 DA1 106 GiB ok e3d5s05 2 DA2 110 GiB ok e3d5s06 2 DA3 110 GiB ok e4d1s01 2 DA1 106 GiB ok e4d1s02 2 DA3 110 GiB ok e4d1s04 2 DA1 106 GiB ok e4d1s05 2 DA2 110 GiB ok e4d1s06 2 DA3 110 GiB ok e4d2s01 2 DA1 106 GiB ok e4d2s02 2 DA2 110 GiB ok e4d2s03 2 DA3 110 GiB ok e4d2s04 2 DA1 106 GiB ok e4d2s05 2 DA2 110 GiB ok e4d2s06 2 DA3 110 GiB ok e4d3s01 2 DA1 108 GiB ok e4d3s02 2 DA2 110 GiB ok e4d3s03 2 DA3 110 GiB ok e4d3s04 2 DA1 108 GiB ok e4d3s05 2 DA2 110 GiB ok e4d3s06 2 DA3 110 GiB ok e4d4s01 2 DA1 106 GiB ok e4d4s02 2 DA2 110 GiB ok e4d4s03 2 DA3 110 GiB ok e4d4s04 2 DA1 106 GiB ok e4d4s05 2 DA2 110 GiB ok e4d4s06 2 DA3 110 GiB ok e4d5s01 2 DA1 106 GiB ok e4d5s02 2 DA2 110 GiB ok e4d5s03 2 DA3 110 GiB ok e4d5s04 2 DA1 106 GiB ok e4d5s05 2 DA2 110 GiB ok e4d5s06 2 DA3 110 GiB ok e5d1s01 2 DA1 108 GiB ok e5d1s02 2 DA2 110 GiB ok e5d1s04 2 DA1 106 GiB ok e5d1s05 2 DA2 110 GiB ok e5d1s06 2 DA3 110 GiB ok e5d2s01 2 DA1 108 GiB ok e5d2s02 2 DA2 110 GiB ok e5d2s03 2 DA3 110 GiB ok e5d2s04 2 DA1 108 GiB ok e5d2s05 2 DA2 110 GiB ok e5d2s06 2 DA3 110 GiB ok e5d3s01 2 DA1 108 GiB ok e5d3s02 2 DA2 110 GiB ok e5d3s03 2 DA3 110 GiB ok e5d3s04 2 DA1 106 GiB ok e5d3s05 2 DA2 110 GiB ok e5d3s06 2 DA3 110 GiB ok e5d4s01 2 DA1 108 GiB ok e5d4s02 2 DA2 110 GiB ok e5d4s03 2 DA3 110 GiB ok e5d4s04 2 DA1 108 GiB ok e5d4s05 2 DA2 110 GiB ok e5d4s06 2 DA3 110 GiB ok e5d5s01 2 DA1 108 GiB ok e5d5s02 2 DA2 110 GiB ok e5d5s03 2 DA3 110 GiB ok e5d5s04 2 DA1 106 GiB ok e5d5s05 2 DA2 110 GiB ok e5d5s06 2 DA3 110 GiB ok e6d1s01 2 DA1 108 GiB ok e6d1s02 2 DA2 110 GiB ok e6d1s04 2 DA1 108 GiB ok e6d1s05 2 DA2 110 GiB ok e6d1s06 2 DA3 110 GiB ok e6d2s01 2 DA1 106 GiB ok e6d2s02 2 DA2 110 GiB ok e6d2s03 2 DA3 110 GiB ok e6d2s04 2 DA1 108 GiB ok e6d2s05 2 DA2 108 GiB ok e6d2s06 2 DA3 110 GiB ok e6d3s01 2 DA1 106 GiB ok e6d3s02 2 DA2 108 GiB ok e6d3s03 2 DA3 110 GiB ok e6d3s04 2 DA1 106 GiB ok e6d3s05 2 DA2 108 GiB ok e6d3s06 2 DA3 108 GiB ok e6d4s01 2 DA1 106 GiB ok e6d4s02 2 DA2 108 GiB ok e6d4s03 2 DA3 108 GiB ok e6d4s04 2 DA1 108 GiB ok e6d4s05 2 DA2 108 GiB ok e6d4s06 2 DA3 108 GiB ok e6d5s01 2 DA1 108 GiB ok e6d5s02 2 DA2 110 GiB ok e6d5s03 2 DA3 108 GiB ok e6d5s04 2 DA1 108 GiB ok e6d5s05 2 DA2 110 GiB ok e6d5s06 2 DA3 108 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss02a_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss02a_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss02a_MetaData_8M_3p_1 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss02a_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss02a_MetaData_8M_3p_3 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss02a_Data_8M_3p_1 8+3p DA3 99 TiB 8 MiB 32 KiB gss02a_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss02a_Data_8M_3p_3 8+3p DA1 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss02a.ebi.ac.uk gss02a.ebi.ac.uk,gss02b.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss02b 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 39% low DA1 no 3 58 2 1 626 GiB 14 days scrub 67% low DA2 no 2 58 2 1 786 GiB 14 days scrub 13% low DA3 no 2 58 2 1 786 GiB 14 days scrub 13% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s07 2 DA1 108 GiB ok e1d1s08 2 DA2 110 GiB ok e1d1s09 2 DA3 110 GiB ok e1d1s10 2 DA1 108 GiB ok e1d1s11 2 DA2 110 GiB ok e1d1s12 2 DA3 110 GiB ok e1d2s07 2 DA1 108 GiB ok e1d2s08 2 DA2 110 GiB ok e1d2s09 2 DA3 110 GiB ok e1d2s10 2 DA1 108 GiB ok e1d2s11 2 DA2 110 GiB ok e1d2s12 2 DA3 110 GiB ok e1d3s07 2 DA1 108 GiB ok e1d3s08 2 DA2 110 GiB ok e1d3s09 2 DA3 110 GiB ok e1d3s10 2 DA1 108 GiB ok e1d3s11 2 DA2 110 GiB ok e1d3s12 2 DA3 110 GiB ok e1d4s07 2 DA1 108 GiB ok e1d4s08 2 DA2 110 GiB ok e1d4s09 2 DA3 110 GiB ok e1d4s10 2 DA1 108 GiB ok e1d4s11 2 DA2 110 GiB ok e1d4s12 2 DA3 110 GiB ok e1d5s07 2 DA1 108 GiB ok e1d5s08 2 DA2 110 GiB ok e1d5s09 2 DA3 110 GiB ok e1d5s10 2 DA3 110 GiB ok e1d5s11 2 DA2 110 GiB ok e1d5s12log 2 LOG 186 GiB ok e2d1s07 2 DA1 108 GiB ok e2d1s08 2 DA2 110 GiB ok e2d1s09 2 DA3 110 GiB ok e2d1s10 2 DA1 108 GiB ok e2d1s11 2 DA2 110 GiB ok e2d1s12 2 DA3 110 GiB ok e2d2s07 2 DA1 108 GiB ok e2d2s08 2 DA2 110 GiB ok e2d2s09 2 DA3 110 GiB ok e2d2s10 2 DA1 108 GiB ok e2d2s11 2 DA2 110 GiB ok e2d2s12 2 DA3 110 GiB ok e2d3s07 2 DA1 108 GiB ok e2d3s08 2 DA2 110 GiB ok e2d3s09 2 DA3 110 GiB ok e2d3s10 2 DA1 108 GiB ok e2d3s11 2 DA2 110 GiB ok e2d3s12 2 DA3 110 GiB ok e2d4s07 2 DA1 108 GiB ok e2d4s08 2 DA2 110 GiB ok e2d4s09 2 DA3 110 GiB ok e2d4s10 2 DA1 108 GiB ok e2d4s11 2 DA2 110 GiB ok e2d4s12 2 DA3 110 GiB ok e2d5s07 2 DA1 108 GiB ok e2d5s08 2 DA2 110 GiB ok e2d5s09 2 DA3 110 GiB ok e2d5s10 2 DA3 110 GiB ok e2d5s11 2 DA2 110 GiB ok e2d5s12log 2 LOG 186 GiB ok e3d1s07 2 DA1 108 GiB ok e3d1s08 2 DA2 110 GiB ok e3d1s09 2 DA3 110 GiB ok e3d1s10 2 DA1 108 GiB ok e3d1s11 2 DA2 110 GiB ok e3d1s12 2 DA3 110 GiB ok e3d2s07 2 DA1 108 GiB ok e3d2s08 2 DA2 110 GiB ok e3d2s09 2 DA3 110 GiB ok e3d2s10 2 DA1 108 GiB ok e3d2s11 2 DA2 110 GiB ok e3d2s12 2 DA3 110 GiB ok e3d3s07 2 DA1 108 GiB ok e3d3s08 2 DA2 110 GiB ok e3d3s09 2 DA3 110 GiB ok e3d3s10 2 DA1 108 GiB ok e3d3s11 2 DA2 110 GiB ok e3d3s12 2 DA3 110 GiB ok e3d4s07 2 DA1 108 GiB ok e3d4s08 2 DA2 110 GiB ok e3d4s09 2 DA3 110 GiB ok e3d4s10 2 DA1 108 GiB ok e3d4s11 2 DA2 110 GiB ok e3d4s12 2 DA3 110 GiB ok e3d5s07 2 DA1 108 GiB ok e3d5s08 2 DA2 110 GiB ok e3d5s09 2 DA3 110 GiB ok e3d5s10 2 DA1 108 GiB ok e3d5s11 2 DA3 110 GiB ok e3d5s12log 2 LOG 186 GiB ok e4d1s07 2 DA1 108 GiB ok e4d1s08 2 DA2 110 GiB ok e4d1s09 2 DA3 110 GiB ok e4d1s10 2 DA1 108 GiB ok e4d1s11 2 DA2 110 GiB ok e4d1s12 2 DA3 110 GiB ok e4d2s07 2 DA1 106 GiB ok e4d2s08 2 DA2 110 GiB ok e4d2s09 2 DA3 110 GiB ok e4d2s10 2 DA1 106 GiB ok e4d2s11 2 DA2 110 GiB ok e4d2s12 2 DA3 110 GiB ok e4d3s07 2 DA1 106 GiB ok e4d3s08 2 DA2 110 GiB ok e4d3s09 2 DA3 110 GiB ok e4d3s10 2 DA1 106 GiB ok e4d3s11 2 DA2 110 GiB ok e4d3s12 2 DA3 110 GiB ok e4d4s07 2 DA1 106 GiB ok e4d4s08 2 DA2 110 GiB ok e4d4s09 2 DA3 110 GiB ok e4d4s10 2 DA1 108 GiB ok e4d4s11 2 DA2 110 GiB ok e4d4s12 2 DA3 110 GiB ok e4d5s07 2 DA1 106 GiB ok e4d5s08 2 DA2 110 GiB ok e4d5s09 2 DA3 110 GiB ok e4d5s10 2 DA1 106 GiB ok e4d5s11 2 DA3 110 GiB ok e5d1s07 2 DA1 106 GiB ok e5d1s08 2 DA2 110 GiB ok e5d1s09 2 DA3 110 GiB ok e5d1s10 2 DA1 106 GiB ok e5d1s11 2 DA2 110 GiB ok e5d1s12 2 DA3 110 GiB ok e5d2s07 2 DA1 106 GiB ok e5d2s08 2 DA2 110 GiB ok e5d2s09 2 DA3 110 GiB ok e5d2s10 2 DA1 106 GiB ok e5d2s11 2 DA2 110 GiB ok e5d2s12 2 DA3 110 GiB ok e5d3s07 2 DA1 106 GiB ok e5d3s08 2 DA2 110 GiB ok e5d3s09 2 DA3 110 GiB ok e5d3s10 2 DA1 106 GiB ok e5d3s11 2 DA2 110 GiB ok e5d3s12 2 DA3 110 GiB ok e5d4s07 2 DA1 106 GiB ok e5d4s08 2 DA2 110 GiB ok e5d4s09 2 DA3 110 GiB ok e5d4s10 2 DA1 106 GiB ok e5d4s11 2 DA2 110 GiB ok e5d4s12 2 DA3 110 GiB ok e5d5s07 2 DA1 106 GiB ok e5d5s08 2 DA2 110 GiB ok e5d5s09 2 DA3 110 GiB ok e5d5s10 2 DA1 106 GiB ok e5d5s11 2 DA2 110 GiB ok e6d1s07 2 DA1 106 GiB ok e6d1s08 2 DA2 110 GiB ok e6d1s09 2 DA3 110 GiB ok e6d1s10 2 DA1 106 GiB ok e6d1s11 2 DA2 110 GiB ok e6d1s12 2 DA3 110 GiB ok e6d2s07 2 DA1 106 GiB ok e6d2s08 2 DA2 110 GiB ok e6d2s09 2 DA3 108 GiB ok e6d2s10 2 DA1 106 GiB ok e6d2s11 2 DA2 108 GiB ok e6d2s12 2 DA3 108 GiB ok e6d3s07 2 DA1 106 GiB ok e6d3s08 2 DA2 108 GiB ok e6d3s09 2 DA3 108 GiB ok e6d3s10 2 DA1 106 GiB ok e6d3s11 2 DA2 108 GiB ok e6d3s12 2 DA3 108 GiB ok e6d4s07 2 DA1 106 GiB ok e6d4s08 2 DA2 108 GiB ok e6d4s09 2 DA3 108 GiB ok e6d4s10 2 DA1 106 GiB ok e6d4s11 2 DA2 108 GiB ok e6d4s12 2 DA3 110 GiB ok e6d5s07 2 DA1 106 GiB ok e6d5s08 2 DA2 110 GiB ok e6d5s09 2 DA3 110 GiB ok e6d5s10 2 DA1 106 GiB ok e6d5s11 2 DA2 110 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss02b_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss02b_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss02b_MetaData_8M_3p_1 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss02b_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss02b_MetaData_8M_3p_3 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss02b_Data_8M_3p_1 8+3p DA1 99 TiB 8 MiB 32 KiB gss02b_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss02b_Data_8M_3p_3 8+3p DA3 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss02b.ebi.ac.uk gss02b.ebi.ac.uk,gss02a.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss03a 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 36% low DA3 no 2 58 2 1 786 GiB 14 days scrub 18% low DA2 no 2 58 2 1 786 GiB 14 days scrub 19% low DA1 no 3 58 2 1 626 GiB 14 days scrub 4% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s01 2 DA3 110 GiB ok e1d1s02 2 DA2 110 GiB ok e1d1s03log 2 LOG 186 GiB ok e1d1s04 2 DA1 108 GiB ok e1d1s05 2 DA2 110 GiB ok e1d1s06 2 DA3 110 GiB ok e1d2s01 2 DA1 108 GiB ok e1d2s02 2 DA2 110 GiB ok e1d2s03 2 DA3 110 GiB ok e1d2s04 2 DA1 108 GiB ok e1d2s05 2 DA2 110 GiB ok e1d2s06 2 DA3 110 GiB ok e1d3s01 2 DA1 108 GiB ok e1d3s02 2 DA2 110 GiB ok e1d3s03 2 DA3 110 GiB ok e1d3s04 2 DA1 108 GiB ok e1d3s05 2 DA2 110 GiB ok e1d3s06 2 DA3 110 GiB ok e1d4s01 2 DA1 108 GiB ok e1d4s02 2 DA2 110 GiB ok e1d4s03 2 DA3 110 GiB ok e1d4s04 2 DA1 108 GiB ok e1d4s05 2 DA2 110 GiB ok e1d4s06 2 DA3 110 GiB ok e1d5s01 2 DA1 108 GiB ok e1d5s02 2 DA2 110 GiB ok e1d5s03 2 DA3 110 GiB ok e1d5s04 2 DA1 108 GiB ok e1d5s05 2 DA2 110 GiB ok e1d5s06 2 DA3 110 GiB ok e2d1s01 2 DA3 110 GiB ok e2d1s02 2 DA2 110 GiB ok e2d1s03log 2 LOG 186 GiB ok e2d1s04 2 DA1 108 GiB ok e2d1s05 2 DA2 110 GiB ok e2d1s06 2 DA3 110 GiB ok e2d2s01 2 DA1 108 GiB ok e2d2s02 2 DA2 110 GiB ok e2d2s03 2 DA3 110 GiB ok e2d2s04 2 DA1 108 GiB ok e2d2s05 2 DA2 110 GiB ok e2d2s06 2 DA3 110 GiB ok e2d3s01 2 DA1 108 GiB ok e2d3s02 2 DA2 110 GiB ok e2d3s03 2 DA3 110 GiB ok e2d3s04 2 DA1 108 GiB ok e2d3s05 2 DA2 110 GiB ok e2d3s06 2 DA3 110 GiB ok e2d4s01 2 DA1 108 GiB ok e2d4s02 2 DA2 110 GiB ok e2d4s03 2 DA3 110 GiB ok e2d4s04 2 DA1 108 GiB ok e2d4s05 2 DA2 110 GiB ok e2d4s06 2 DA3 110 GiB ok e2d5s01 2 DA1 108 GiB ok e2d5s02 2 DA2 110 GiB ok e2d5s03 2 DA3 110 GiB ok e2d5s04 2 DA1 108 GiB ok e2d5s05 2 DA2 110 GiB ok e2d5s06 2 DA3 110 GiB ok e3d1s01 2 DA1 108 GiB ok e3d1s02 2 DA3 110 GiB ok e3d1s03log 2 LOG 186 GiB ok e3d1s04 2 DA1 108 GiB ok e3d1s05 2 DA2 110 GiB ok e3d1s06 2 DA3 110 GiB ok e3d2s01 2 DA1 108 GiB ok e3d2s02 2 DA2 110 GiB ok e3d2s03 2 DA3 110 GiB ok e3d2s04 2 DA1 108 GiB ok e3d2s05 2 DA2 110 GiB ok e3d2s06 2 DA3 110 GiB ok e3d3s01 2 DA1 108 GiB ok e3d3s02 2 DA2 110 GiB ok e3d3s03 2 DA3 110 GiB ok e3d3s04 2 DA1 108 GiB ok e3d3s05 2 DA2 110 GiB ok e3d3s06 2 DA3 110 GiB ok e3d4s01 2 DA1 108 GiB ok e3d4s02 2 DA2 110 GiB ok e3d4s03 2 DA3 110 GiB ok e3d4s04 2 DA1 108 GiB ok e3d4s05 2 DA2 110 GiB ok e3d4s06 2 DA3 110 GiB ok e3d5s01 2 DA1 108 GiB ok e3d5s02 2 DA2 110 GiB ok e3d5s03 2 DA3 110 GiB ok e3d5s04 2 DA1 108 GiB ok e3d5s05 2 DA2 110 GiB ok e3d5s06 2 DA3 110 GiB ok e4d1s01 2 DA1 108 GiB ok e4d1s02 2 DA3 110 GiB ok e4d1s04 2 DA1 108 GiB ok e4d1s05 2 DA2 110 GiB ok e4d1s06 2 DA3 110 GiB ok e4d2s01 2 DA1 108 GiB ok e4d2s02 2 DA2 110 GiB ok e4d2s03 2 DA3 110 GiB ok e4d2s04 2 DA1 106 GiB ok e4d2s05 2 DA2 110 GiB ok e4d2s06 2 DA3 110 GiB ok e4d3s01 2 DA1 106 GiB ok e4d3s02 2 DA2 110 GiB ok e4d3s03 2 DA3 110 GiB ok e4d3s04 2 DA1 106 GiB ok e4d3s05 2 DA2 110 GiB ok e4d3s06 2 DA3 110 GiB ok e4d4s01 2 DA1 106 GiB ok e4d4s02 2 DA2 110 GiB ok e4d4s03 2 DA3 110 GiB ok e4d4s04 2 DA1 106 GiB ok e4d4s05 2 DA2 110 GiB ok e4d4s06 2 DA3 110 GiB ok e4d5s01 2 DA1 106 GiB ok e4d5s02 2 DA2 110 GiB ok e4d5s03 2 DA3 110 GiB ok e4d5s04 2 DA1 106 GiB ok e4d5s05 2 DA2 110 GiB ok e4d5s06 2 DA3 110 GiB ok e5d1s01 2 DA1 106 GiB ok e5d1s02 2 DA2 110 GiB ok e5d1s04 2 DA1 106 GiB ok e5d1s05 2 DA2 110 GiB ok e5d1s06 2 DA3 110 GiB ok e5d2s01 2 DA1 106 GiB ok e5d2s02 2 DA2 110 GiB ok e5d2s03 2 DA3 110 GiB ok e5d2s04 2 DA1 106 GiB ok e5d2s05 2 DA2 110 GiB ok e5d2s06 2 DA3 110 GiB ok e5d3s01 2 DA1 106 GiB ok e5d3s02 2 DA2 110 GiB ok e5d3s03 2 DA3 110 GiB ok e5d3s04 2 DA1 106 GiB ok e5d3s05 2 DA2 110 GiB ok e5d3s06 2 DA3 110 GiB ok e5d4s01 2 DA1 106 GiB ok e5d4s02 2 DA2 110 GiB ok e5d4s03 2 DA3 110 GiB ok e5d4s04 2 DA1 106 GiB ok e5d4s05 2 DA2 110 GiB ok e5d4s06 2 DA3 110 GiB ok e5d5s01 2 DA1 106 GiB ok e5d5s02 2 DA2 110 GiB ok e5d5s03 2 DA3 110 GiB ok e5d5s04 2 DA1 106 GiB ok e5d5s05 2 DA2 110 GiB ok e5d5s06 2 DA3 110 GiB ok e6d1s01 2 DA1 106 GiB ok e6d1s02 2 DA2 110 GiB ok e6d1s04 2 DA1 106 GiB ok e6d1s05 2 DA2 110 GiB ok e6d1s06 2 DA3 110 GiB ok e6d2s01 2 DA1 106 GiB ok e6d2s02 2 DA2 110 GiB ok e6d2s03 2 DA3 110 GiB ok e6d2s04 2 DA1 106 GiB ok e6d2s05 2 DA2 108 GiB ok e6d2s06 2 DA3 108 GiB ok e6d3s01 2 DA1 106 GiB ok e6d3s02 2 DA2 108 GiB ok e6d3s03 2 DA3 108 GiB ok e6d3s04 2 DA1 106 GiB ok e6d3s05 2 DA2 108 GiB ok e6d3s06 2 DA3 108 GiB ok e6d4s01 2 DA1 106 GiB ok e6d4s02 2 DA2 108 GiB ok e6d4s03 2 DA3 108 GiB ok e6d4s04 2 DA1 106 GiB ok e6d4s05 2 DA2 108 GiB ok e6d4s06 2 DA3 108 GiB ok e6d5s01 2 DA1 106 GiB ok e6d5s02 2 DA2 110 GiB ok e6d5s03 2 DA3 110 GiB ok e6d5s04 2 DA1 106 GiB ok e6d5s05 2 DA2 110 GiB ok e6d5s06 2 DA3 110 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss03a_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss03a_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss03a_MetaData_8M_3p_1 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss03a_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss03a_MetaData_8M_3p_3 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss03a_Data_8M_3p_1 8+3p DA3 99 TiB 8 MiB 32 KiB gss03a_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss03a_Data_8M_3p_3 8+3p DA1 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss03a.ebi.ac.uk gss03a.ebi.ac.uk,gss03b.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss03b 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 38% low DA1 no 3 58 2 1 626 GiB 14 days scrub 12% low DA2 no 2 58 2 1 786 GiB 14 days scrub 20% low DA3 no 2 58 2 1 786 GiB 14 days scrub 19% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s07 2 DA1 108 GiB ok e1d1s08 2 DA2 110 GiB ok e1d1s09 2 DA3 110 GiB ok e1d1s10 2 DA1 108 GiB ok e1d1s11 2 DA2 110 GiB ok e1d1s12 2 DA3 110 GiB ok e1d2s07 2 DA1 108 GiB ok e1d2s08 2 DA2 110 GiB ok e1d2s09 2 DA3 110 GiB ok e1d2s10 2 DA1 108 GiB ok e1d2s11 2 DA2 110 GiB ok e1d2s12 2 DA3 110 GiB ok e1d3s07 2 DA1 108 GiB ok e1d3s08 2 DA2 110 GiB ok e1d3s09 2 DA3 110 GiB ok e1d3s10 2 DA1 108 GiB ok e1d3s11 2 DA2 110 GiB ok e1d3s12 2 DA3 110 GiB ok e1d4s07 2 DA1 108 GiB ok e1d4s08 2 DA2 110 GiB ok e1d4s09 2 DA3 110 GiB ok e1d4s10 2 DA1 108 GiB ok e1d4s11 2 DA2 110 GiB ok e1d4s12 2 DA3 110 GiB ok e1d5s07 2 DA1 108 GiB ok e1d5s08 2 DA2 110 GiB ok e1d5s09 2 DA3 110 GiB ok e1d5s10 2 DA3 110 GiB ok e1d5s11 2 DA2 110 GiB ok e1d5s12log 2 LOG 186 GiB ok e2d1s07 2 DA1 108 GiB ok e2d1s08 2 DA2 110 GiB ok e2d1s09 2 DA3 110 GiB ok e2d1s10 2 DA1 106 GiB ok e2d1s11 2 DA2 110 GiB ok e2d1s12 2 DA3 110 GiB ok e2d2s07 2 DA1 106 GiB ok e2d2s08 2 DA2 110 GiB ok e2d2s09 2 DA3 110 GiB ok e2d2s10 2 DA1 106 GiB ok e2d2s11 2 DA2 110 GiB ok e2d2s12 2 DA3 110 GiB ok e2d3s07 2 DA1 106 GiB ok e2d3s08 2 DA2 110 GiB ok e2d3s09 2 DA3 110 GiB ok e2d3s10 2 DA1 106 GiB ok e2d3s11 2 DA2 110 GiB ok e2d3s12 2 DA3 110 GiB ok e2d4s07 2 DA1 106 GiB ok e2d4s08 2 DA2 110 GiB ok e2d4s09 2 DA3 110 GiB ok e2d4s10 2 DA1 108 GiB ok e2d4s11 2 DA2 110 GiB ok e2d4s12 2 DA3 110 GiB ok e2d5s07 2 DA1 108 GiB ok e2d5s08 2 DA2 110 GiB ok e2d5s09 2 DA3 110 GiB ok e2d5s10 2 DA3 110 GiB ok e2d5s11 2 DA2 110 GiB ok e2d5s12log 2 LOG 186 GiB ok e3d1s07 2 DA1 108 GiB ok e3d1s08 2 DA2 110 GiB ok e3d1s09 2 DA3 110 GiB ok e3d1s10 2 DA1 106 GiB ok e3d1s11 2 DA2 110 GiB ok e3d1s12 2 DA3 110 GiB ok e3d2s07 2 DA1 106 GiB ok e3d2s08 2 DA2 110 GiB ok e3d2s09 2 DA3 110 GiB ok e3d2s10 2 DA1 108 GiB ok e3d2s11 2 DA2 110 GiB ok e3d2s12 2 DA3 110 GiB ok e3d3s07 2 DA1 106 GiB ok e3d3s08 2 DA2 110 GiB ok e3d3s09 2 DA3 110 GiB ok e3d3s10 2 DA1 106 GiB ok e3d3s11 2 DA2 110 GiB ok e3d3s12 2 DA3 110 GiB ok e3d4s07 2 DA1 106 GiB ok e3d4s08 2 DA2 110 GiB ok e3d4s09 2 DA3 110 GiB ok e3d4s10 2 DA1 108 GiB ok e3d4s11 2 DA2 110 GiB ok e3d4s12 2 DA3 110 GiB ok e3d5s07 2 DA1 108 GiB ok e3d5s08 2 DA2 110 GiB ok e3d5s09 2 DA3 110 GiB ok e3d5s10 2 DA1 106 GiB ok e3d5s11 2 DA3 110 GiB ok e3d5s12log 2 LOG 186 GiB ok e4d1s07 2 DA1 106 GiB ok e4d1s08 2 DA2 110 GiB ok e4d1s09 2 DA3 110 GiB ok e4d1s10 2 DA1 106 GiB ok e4d1s11 2 DA2 110 GiB ok e4d1s12 2 DA3 110 GiB ok e4d2s07 2 DA1 106 GiB ok e4d2s08 2 DA2 110 GiB ok e4d2s09 2 DA3 110 GiB ok e4d2s10 2 DA1 106 GiB ok e4d2s11 2 DA2 110 GiB ok e4d2s12 2 DA3 110 GiB ok e4d3s07 2 DA1 108 GiB ok e4d3s08 2 DA2 110 GiB ok e4d3s09 2 DA3 110 GiB ok e4d3s10 2 DA1 108 GiB ok e4d3s11 2 DA2 110 GiB ok e4d3s12 2 DA3 110 GiB ok e4d4s07 2 DA1 106 GiB ok e4d4s08 2 DA2 110 GiB ok e4d4s09 2 DA3 110 GiB ok e4d4s10 2 DA1 106 GiB ok e4d4s11 2 DA2 110 GiB ok e4d4s12 2 DA3 110 GiB ok e4d5s07 2 DA1 106 GiB ok e4d5s08 2 DA2 110 GiB ok e4d5s09 2 DA3 110 GiB ok e4d5s10 2 DA1 106 GiB ok e4d5s11 2 DA3 110 GiB ok e5d1s07 2 DA1 108 GiB ok e5d1s08 2 DA2 110 GiB ok e5d1s09 2 DA3 110 GiB ok e5d1s10 2 DA1 106 GiB ok e5d1s11 2 DA2 110 GiB ok e5d1s12 2 DA3 110 GiB ok e5d2s07 2 DA1 108 GiB ok e5d2s08 2 DA2 110 GiB ok e5d2s09 2 DA3 110 GiB ok e5d2s10 2 DA1 108 GiB ok e5d2s11 2 DA2 110 GiB ok e5d2s12 2 DA3 110 GiB ok e5d3s07 2 DA1 108 GiB ok e5d3s08 2 DA2 110 GiB ok e5d3s09 2 DA3 110 GiB ok e5d3s10 2 DA1 106 GiB ok e5d3s11 2 DA2 110 GiB ok e5d3s12 2 DA3 110 GiB ok e5d4s07 2 DA1 108 GiB ok e5d4s08 2 DA2 110 GiB ok e5d4s09 2 DA3 110 GiB ok e5d4s10 2 DA1 108 GiB ok e5d4s11 2 DA2 110 GiB ok e5d4s12 2 DA3 110 GiB ok e5d5s07 2 DA1 108 GiB ok e5d5s08 2 DA2 110 GiB ok e5d5s09 2 DA3 110 GiB ok e5d5s10 2 DA1 106 GiB ok e5d5s11 2 DA2 110 GiB ok e6d1s07 2 DA1 108 GiB ok e6d1s08 2 DA2 110 GiB ok e6d1s09 2 DA3 110 GiB ok e6d1s10 2 DA1 108 GiB ok e6d1s11 2 DA2 110 GiB ok e6d1s12 2 DA3 110 GiB ok e6d2s07 2 DA1 106 GiB ok e6d2s08 2 DA2 110 GiB ok e6d2s09 2 DA3 108 GiB ok e6d2s10 2 DA1 108 GiB ok e6d2s11 2 DA2 108 GiB ok e6d2s12 2 DA3 108 GiB ok e6d3s07 2 DA1 106 GiB ok e6d3s08 2 DA2 108 GiB ok e6d3s09 2 DA3 108 GiB ok e6d3s10 2 DA1 106 GiB ok e6d3s11 2 DA2 108 GiB ok e6d3s12 2 DA3 108 GiB ok e6d4s07 2 DA1 106 GiB ok e6d4s08 2 DA2 108 GiB ok e6d4s09 2 DA3 108 GiB ok e6d4s10 2 DA1 108 GiB ok e6d4s11 2 DA2 108 GiB ok e6d4s12 2 DA3 110 GiB ok e6d5s07 2 DA1 108 GiB ok e6d5s08 2 DA2 110 GiB ok e6d5s09 2 DA3 110 GiB ok e6d5s10 2 DA1 108 GiB ok e6d5s11 2 DA2 110 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss03b_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss03b_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss03b_MetaData_8M_3p_1 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss03b_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss03b_MetaData_8M_3p_3 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss03b_Data_8M_3p_1 8+3p DA1 99 TiB 8 MiB 32 KiB gss03b_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss03b_Data_8M_3p_3 8+3p DA3 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss03b.ebi.ac.uk gss03b.ebi.ac.uk,gss03a.ebi.ac.uk -------------- next part -------------- === mmdiag: config === allowDeleteAclOnChmod 1 assertOnStructureError 0 atimeDeferredSeconds 86400 ! cipherList AUTHONLY ! clusterId 17987981184946329605 ! clusterName GSS.ebi.ac.uk consoleLogEvents 0 dataStructureDump 1 /tmp/mmfs dataStructureDumpOnRGOpenFailed 0 /tmp/mmfs dataStructureDumpOnSGPanic 0 /tmp/mmfs dataStructureDumpWait 60 dbBlockSizeThreshold -1 distributedTokenServer 1 dmapiAllowMountOnWindows 1 dmapiDataEventRetry 2 dmapiEnable 1 dmapiEventBuffers 64 dmapiEventTimeout -1 ! dmapiFileHandleSize 32 dmapiMountEvent all dmapiMountTimeout 60 dmapiSessionFailureTimeout 0 dmapiWorkerThreads 12 enableIPv6 0 enableLowspaceEvents 0 enableNFSCluster 0 enableStatUIDremap 0 enableTreeBasedQuotas 0 enableUIDremap 0 encryptionCryptoEngineLibName (NULL) encryptionCryptoEngineType CLiC enforceFilesetQuotaOnRoot 0 envVar ! failureDetectionTime 60 fgdlActivityTimeWindow 10 fgdlLeaveThreshold 1000 fineGrainDirLocks 1 FIPS1402mode 0 FleaDisableIntegrityChecks 0 FleaNumAsyncIOThreads 2 FleaNumLEBBuffers 256 FleaPreferredStripSize 0 ! flushedDataTarget 1024 ! flushedInodeTarget 1024 healthCheckInterval 10 idleSocketTimeout 3600 ignorePrefetchLUNCount 0 ignoreReplicaSpaceOnStat 0 ignoreReplicationForQuota 0 ignoreReplicationOnStatfs 0 ! ioHistorySize 65536 iscanPrefetchAggressiveness 2 leaseDMSTimeout -1 leaseDuration -1 leaseRecoveryWait 35 ! logBufferCount 20 ! logWrapAmountPct 2 ! logWrapThreads 128 lrocChecksum 0 lrocData 1 lrocDataMaxBufferSize 32768 lrocDataMaxFileSize 32768 lrocDataStubFileSize 0 lrocDeviceMaxSectorsKB 64 lrocDeviceNrRequests 1024 lrocDeviceQueueDepth 31 lrocDevices lrocDeviceScheduler deadline lrocDeviceSetParams 1 lrocDirectories 1 lrocInodes 1 ! maxAllocRegionsPerNode 32 ! maxBackgroundDeletionThreads 16 ! maxblocksize 16777216 ! maxBufferCleaners 1024 ! maxBufferDescs 2097152 maxDiskAddrBuffs -1 maxFcntlRangesPerFile 200 ! maxFileCleaners 1024 maxFileNameBytes 255 ! maxFilesToCache 12288 ! maxGeneralThreads 1280 ! maxInodeDeallocPrefetch 128 ! maxMBpS 16000 maxMissedPingTimeout 60 ! maxReceiverThreads 128 ! maxStatCache 512 maxTokenServers 128 minMissedPingTimeout 3 minQuorumNodes 1 ! minReleaseLevel 1340 ! myNodeConfigNumber 5 noSpaceEventInterval 120 nsdBufSpace (% of PagePool) 30 ! nsdClientCksumTypeLocal NsdCksum_Ck64 ! nsdClientCksumTypeRemote NsdCksum_Ck64 nsdDumpBuffersOnCksumError 0 nsd_cksum_capture ! nsdInlineWriteMax 32768 ! nsdMaxWorkerThreads 3072 ! nsdMinWorkerThreads 3072 nsdMultiQueue 256 nsdRAIDAllowTraditionalNSD 0 nsdRAIDAULogColocationLimit 131072 nsdRAIDBackgroundMinPct 5 ! nsdRAIDBlockDeviceMaxSectorsKB 4096 ! nsdRAIDBlockDeviceNrRequests 32 ! nsdRAIDBlockDeviceQueueDepth 16 ! nsdRAIDBlockDeviceScheduler deadline ! nsdRAIDBufferPoolSizePct (% of PagePool) 80 nsdRAIDBuffersPromotionThresholdPct 50 nsdRAIDCreateVdiskThreads 8 nsdRAIDDiskDiscoveryInterval 180 ! nsdRAIDEventLogToConsole all ! nsdRAIDFastWriteFSDataLimit 65536 ! nsdRAIDFastWriteFSMetadataLimit 262144 ! nsdRAIDFlusherBuffersLimitPct 80 ! nsdRAIDFlusherBuffersLowWatermarkPct 20 ! nsdRAIDFlusherFWLogHighWatermarkMB 1000 ! nsdRAIDFlusherFWLogLimitMB 5000 ! nsdRAIDFlusherThreadsHighWatermark 512 ! nsdRAIDFlusherThreadsLowWatermark 1 ! nsdRAIDFlusherTracksLimitPct 80 ! nsdRAIDFlusherTracksLowWatermarkPct 20 nsdRAIDForegroundMinPct 15 ! nsdRAIDMaxTransientStale2FT 1 ! nsdRAIDMaxTransientStale3FT 1 nsdRAIDMediumWriteLimitPct 50 nsdRAIDMultiQueue -1 ! nsdRAIDReconstructAggressiveness 1 ! nsdRAIDSmallBufferSize 262144 ! nsdRAIDSmallThreadRatio 2 ! nsdRAIDThreadsPerQueue 16 ! nsdRAIDTracks 131072 ! numaMemoryInterleave yes opensslLibName /usr/lib64/libssl.so.10:/usr/lib64/libssl.so.6:/usr/lib64/libssl.so.0.9.8:/lib64/libssl.so.6:libssl.so:libssl.so.0:libssl.so.4 ! pagepool 40802189312 pagepoolMaxPhysMemPct 75 prefetchAggressiveness 2 prefetchAggressivenessRead -1 prefetchAggressivenessWrite -1 ! prefetchPct 5 prefetchThreads 72 readReplicaPolicy default remoteMountTimeout 10 sharedMemLimit 0 sharedMemReservePct 15 sidAutoMapRangeLength 15000000 sidAutoMapRangeStart 15000000 ! socketMaxListenConnections 1500 socketRcvBufferSize 0 socketSndBufferSize 0 statCacheDirPct 10 subnets ! syncWorkerThreads 256 tiebreaker system tiebreakerDisks tokenMemLimit 536870912 treatOSyncLikeODSync 1 tscTcpPort 1191 ! tscWorkerPool 64 uidDomain GSS.ebi.ac.uk uidExpiration 36000 unmountOnDiskFail no useDIOXW 1 usePersistentReserve 0 verbsLibName libibverbs.so verbsPorts verbsRdma disable verbsRdmaCm disable verbsRdmaCmLibName librdmacm.so verbsRdmaMaxSendBytes 16777216 verbsRdmaMinBytes 8192 verbsRdmaQpRtrMinRnrTimer 18 verbsRdmaQpRtrPathMtu 2048 verbsRdmaQpRtrSl 0 verbsRdmaQpRtrSlDynamic 0 verbsRdmaQpRtrSlDynamicTimeout 10 verbsRdmaQpRtsRetryCnt 6 verbsRdmaQpRtsRnrRetry 6 verbsRdmaQpRtsTimeout 18 verbsRdmaSend 0 verbsRdmasPerConnection 8 verbsRdmasPerNode 0 verbsRdmaTimeout 18 verifyGpfsReady 0 ! worker1Threads 1024 ! worker3Threads 32 writebehindThreshold 524288 From oehmes at us.ibm.com Tue Oct 14 18:23:50 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 10:23:50 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: <543D51B6.3070602@ebi.ac.uk> References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> <543D51B6.3070602@ebi.ac.uk> Message-ID: you basically run GSS 1.0 code , while in the current version is GSS 2.0 (which replaced Version 1.5 2 month ago) GSS 1.5 and 2.0 have several enhancements in this space so i strongly encourage you to upgrade your systems. if you can specify a bit what your workload is there might also be additional knobs we can turn to change the behavior. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM: > From: Salvatore Di Nardo > To: gpfsug main discussion list > Date: 10/14/2014 09:40 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > > Thanks in advance for your help. > > We have 6 RG: > recovery group vdisks vdisks servers > ------------------ ----------- ------ ------- > gss01a 4 8 gss01a.ebi.ac.uk,gss01b.ebi.ac.uk > gss01b 4 8 gss01b.ebi.ac.uk,gss01a.ebi.ac.uk > gss02a 4 8 gss02a.ebi.ac.uk,gss02b.ebi.ac.uk > gss02b 4 8 gss02b.ebi.ac.uk,gss02a.ebi.ac.uk > gss03a 4 8 gss03a.ebi.ac.uk,gss03b.ebi.ac.uk > gss03b 4 8 gss03b.ebi.ac.uk,gss03a.ebi.ac.uk > > Check the attached file for RG details. > Following mmlsconfig: > [root at gss01a ~]# mmlsconfig > Configuration data for cluster GSS.ebi.ac.uk: > --------------------------------------------- > myNodeConfigNumber 1 > clusterName GSS.ebi.ac.uk > clusterId 17987981184946329605 > autoload no > dmapiFileHandleSize 32 > minReleaseLevel 3.5.0.11 > [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] > pagepool 38g > nsdRAIDBufferPoolSizePct 80 > maxBufferDescs 2m > numaMemoryInterleave yes > prefetchPct 5 > maxblocksize 16m > nsdRAIDTracks 128k > ioHistorySize 64k > nsdRAIDSmallBufferSize 256k > nsdMaxWorkerThreads 3k > nsdMinWorkerThreads 3k > nsdRAIDSmallThreadRatio 2 > nsdRAIDThreadsPerQueue 16 > nsdClientCksumTypeLocal ck64 > nsdClientCksumTypeRemote ck64 > nsdRAIDEventLogToConsole all > nsdRAIDFastWriteFSDataLimit 64k > nsdRAIDFastWriteFSMetadataLimit 256k > nsdRAIDReconstructAggressiveness 1 > nsdRAIDFlusherBuffersLowWatermarkPct 20 > nsdRAIDFlusherBuffersLimitPct 80 > nsdRAIDFlusherTracksLowWatermarkPct 20 > nsdRAIDFlusherTracksLimitPct 80 > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > nsdRAIDFlusherFWLogLimitMB 5000 > nsdRAIDFlusherThreadsLowWatermark 1 > nsdRAIDFlusherThreadsHighWatermark 512 > nsdRAIDBlockDeviceMaxSectorsKB 4096 > nsdRAIDBlockDeviceNrRequests 32 > nsdRAIDBlockDeviceQueueDepth 16 > nsdRAIDBlockDeviceScheduler deadline > nsdRAIDMaxTransientStale2FT 1 > nsdRAIDMaxTransientStale3FT 1 > syncWorkerThreads 256 > tscWorkerPool 64 > nsdInlineWriteMax 32k > maxFilesToCache 12k > maxStatCache 512 > maxGeneralThreads 1280 > flushedDataTarget 1024 > flushedInodeTarget 1024 > maxFileCleaners 1024 > maxBufferCleaners 1024 > logBufferCount 20 > logWrapAmountPct 2 > logWrapThreads 128 > maxAllocRegionsPerNode 32 > maxBackgroundDeletionThreads 16 > maxInodeDeallocPrefetch 128 > maxMBpS 16000 > maxReceiverThreads 128 > worker1Threads 1024 > worker3Threads 32 > [common] > cipherList AUTHONLY > socketMaxListenConnections 1500 > failureDetectionTime 60 > [common] > adminMode central > > File systems in cluster GSS.ebi.ac.uk: > -------------------------------------- > /dev/gpfs1 > For more configuration paramenters i also attached a file with the > complete output of mmdiag --config. > > > and mmlsfs: > > File system attributes for /dev/gpfs1: > ====================================== > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 32768 Minimum fragment size > in bytes (system pool) > 262144 Minimum fragment size > in bytes (other pools) > -i 512 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 2 Default number of > metadata replicas > -M 2 Maximum number of > metadata replicas > -r 1 Default number of data replicas > -R 2 Maximum number of data replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in effect > -k all ACL semantics in effect > -n 1000 Estimated number of > nodes that will mount file system > -B 1048576 Block size (system pool) > 8388608 Block size (other pools) > -Q user;group;fileset Quotas enforced > user;group;fileset Default quotas enabled > --filesetdf no Fileset df enabled? > -V 13.23 (3.5.0.7) File system version > --create-time Tue Mar 18 16:01:24 2014 File system creation time > -u yes Support for large LUNs? > -z no Is DMAPI enabled? > -L 4194304 Logfile size > -E yes Exact mtime mount option > -S yes Suppress atime mount option > -K whenpossible Strict replica allocation option > --fastea yes Fast external attributes enabled? > --inode-limit 134217728 Maximum number of inodes > -P system;data Disk storage pools in file system > -d > gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; > -d > gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; > -d > gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; > -d > gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; > -d > gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 > Disks in file system > --perfileset-quota no Per-fileset quota enforcement > -A yes Automatic mount option > -o none Additional mount options > -T /gpfs1 Default mount point > --mount-priority 0 Mount priority > > > Regards, > Salvatore > > On 14/10/14 17:22, Sven Oehme wrote: > your GSS code version is very backlevel. > > can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk > as well as mmlsconfig and mmlsfs all > > thx. Sven > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug-discuss at gpfsug.org > Date: 10/14/2014 08:23 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > > > > > On 14/10/14 15:51, Sven Oehme wrote: > it means there is contention on inserting data into the fast write > log on the GSS Node, which could be config or workload related > what GSS code version are you running > [root at ebi5-251 ~]# mmdiag --version > > === mmdiag: version === > Current GPFS build: "3.5.0-11 efix1 (888041)". > Built on Jul 9 2013 at 18:03:32 > Running 6 days 2 hours 10 minutes 35 secs > > > > and how are the nodes connected with each other (Ethernet or IB) ? > ethernet. they use the same bonding (4x10Gb/s) where the data is > passing. We don't have admin dedicated network > > [root at gss03a ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager > > > Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different > subnet because of datacenter constraints ( They are not physically > in the same row, and due to network constraints was not possible to > put them in the same subnet). The packets are routed, but should not > be a problem as there is 160Gb/s bandwidth between them. > > Regards, > Salvatore > > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug main discussion list > Date: 10/14/2014 07:40 AM > Subject: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > > > > hello all, > could someone explain me the meaning of those waiters? > > gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > > Does it means that the vdisk logs are struggling? > > Regards, > Salvatore > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/ > IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Tue Oct 14 18:32:50 2014 From: zgiles at gmail.com (Zachary Giles) Date: Tue, 14 Oct 2014 13:32:50 -0400 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> <543D51B6.3070602@ebi.ac.uk> Message-ID: Except that AFAIK no one has published how to update GSS or where the update code is.. All I've heard is "contact your sales rep". Any pointers? On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme wrote: > you basically run GSS 1.0 code , while in the current version is GSS 2.0 > (which replaced Version 1.5 2 month ago) > > GSS 1.5 and 2.0 have several enhancements in this space so i strongly > encourage you to upgrade your systems. > > if you can specify a bit what your workload is there might also be > additional knobs we can turn to change the behavior. > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM: > >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 09:40 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> Thanks in advance for your help. >> >> We have 6 RG: > >> recovery group vdisks vdisks servers >> ------------------ ----------- ------ ------- >> gss01a 4 8 >> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk >> gss01b 4 8 >> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk >> gss02a 4 8 >> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk >> gss02b 4 8 >> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk >> gss03a 4 8 >> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk >> gss03b 4 8 >> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk >> >> Check the attached file for RG details. >> Following mmlsconfig: > >> [root at gss01a ~]# mmlsconfig >> Configuration data for cluster GSS.ebi.ac.uk: >> --------------------------------------------- >> myNodeConfigNumber 1 >> clusterName GSS.ebi.ac.uk >> clusterId 17987981184946329605 >> autoload no >> dmapiFileHandleSize 32 >> minReleaseLevel 3.5.0.11 >> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] >> pagepool 38g >> nsdRAIDBufferPoolSizePct 80 >> maxBufferDescs 2m >> numaMemoryInterleave yes >> prefetchPct 5 >> maxblocksize 16m >> nsdRAIDTracks 128k >> ioHistorySize 64k >> nsdRAIDSmallBufferSize 256k >> nsdMaxWorkerThreads 3k >> nsdMinWorkerThreads 3k >> nsdRAIDSmallThreadRatio 2 >> nsdRAIDThreadsPerQueue 16 >> nsdClientCksumTypeLocal ck64 >> nsdClientCksumTypeRemote ck64 >> nsdRAIDEventLogToConsole all >> nsdRAIDFastWriteFSDataLimit 64k >> nsdRAIDFastWriteFSMetadataLimit 256k >> nsdRAIDReconstructAggressiveness 1 >> nsdRAIDFlusherBuffersLowWatermarkPct 20 >> nsdRAIDFlusherBuffersLimitPct 80 >> nsdRAIDFlusherTracksLowWatermarkPct 20 >> nsdRAIDFlusherTracksLimitPct 80 >> nsdRAIDFlusherFWLogHighWatermarkMB 1000 >> nsdRAIDFlusherFWLogLimitMB 5000 >> nsdRAIDFlusherThreadsLowWatermark 1 >> nsdRAIDFlusherThreadsHighWatermark 512 >> nsdRAIDBlockDeviceMaxSectorsKB 4096 >> nsdRAIDBlockDeviceNrRequests 32 >> nsdRAIDBlockDeviceQueueDepth 16 >> nsdRAIDBlockDeviceScheduler deadline >> nsdRAIDMaxTransientStale2FT 1 >> nsdRAIDMaxTransientStale3FT 1 >> syncWorkerThreads 256 >> tscWorkerPool 64 >> nsdInlineWriteMax 32k >> maxFilesToCache 12k >> maxStatCache 512 >> maxGeneralThreads 1280 >> flushedDataTarget 1024 >> flushedInodeTarget 1024 >> maxFileCleaners 1024 >> maxBufferCleaners 1024 >> logBufferCount 20 >> logWrapAmountPct 2 >> logWrapThreads 128 >> maxAllocRegionsPerNode 32 >> maxBackgroundDeletionThreads 16 >> maxInodeDeallocPrefetch 128 >> maxMBpS 16000 >> maxReceiverThreads 128 >> worker1Threads 1024 >> worker3Threads 32 >> [common] >> cipherList AUTHONLY >> socketMaxListenConnections 1500 >> failureDetectionTime 60 >> [common] >> adminMode central >> >> File systems in cluster GSS.ebi.ac.uk: >> -------------------------------------- >> /dev/gpfs1 > >> For more configuration paramenters i also attached a file with the >> complete output of mmdiag --config. >> >> >> and mmlsfs: >> >> File system attributes for /dev/gpfs1: >> ====================================== >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 32768 Minimum fragment size >> in bytes (system pool) >> 262144 Minimum fragment size >> in bytes (other pools) >> -i 512 Inode size in bytes >> -I 32768 Indirect block size in bytes >> -m 2 Default number of >> metadata replicas >> -M 2 Maximum number of >> metadata replicas >> -r 1 Default number of data >> replicas >> -R 2 Maximum number of data >> replicas >> -j scatter Block allocation type >> -D nfs4 File locking semantics in >> effect >> -k all ACL semantics in effect >> -n 1000 Estimated number of >> nodes that will mount file system >> -B 1048576 Block size (system pool) >> 8388608 Block size (other pools) >> -Q user;group;fileset Quotas enforced >> user;group;fileset Default quotas enabled >> --filesetdf no Fileset df enabled? >> -V 13.23 (3.5.0.7) File system version >> --create-time Tue Mar 18 16:01:24 2014 File system creation time >> -u yes Support for large LUNs? >> -z no Is DMAPI enabled? >> -L 4194304 Logfile size >> -E yes Exact mtime mount option >> -S yes Suppress atime mount option >> -K whenpossible Strict replica allocation >> option >> --fastea yes Fast external attributes >> enabled? >> --inode-limit 134217728 Maximum number of inodes >> -P system;data Disk storage pools in file >> system >> -d >> >> gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; >> -d >> >> gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; >> -d >> >> gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; >> -d >> >> gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; >> -d >> >> gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 >> Disks in file system >> --perfileset-quota no Per-fileset quota enforcement >> -A yes Automatic mount option >> -o none Additional mount options >> -T /gpfs1 Default mount point >> --mount-priority 0 Mount priority >> >> >> Regards, >> Salvatore >> > >> On 14/10/14 17:22, Sven Oehme wrote: >> your GSS code version is very backlevel. >> >> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk >> as well as mmlsconfig and mmlsfs all >> >> thx. Sven >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug-discuss at gpfsug.org >> Date: 10/14/2014 08:23 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> >> On 14/10/14 15:51, Sven Oehme wrote: >> it means there is contention on inserting data into the fast write >> log on the GSS Node, which could be config or workload related >> what GSS code version are you running >> [root at ebi5-251 ~]# mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "3.5.0-11 efix1 (888041)". >> Built on Jul 9 2013 at 18:03:32 >> Running 6 days 2 hours 10 minutes 35 secs >> >> >> >> and how are the nodes connected with each other (Ethernet or IB) ? >> ethernet. they use the same bonding (4x10Gb/s) where the data is >> passing. We don't have admin dedicated network >> >> [root at gss03a ~]# mmlscluster >> >> GPFS cluster information >> ======================== >> GPFS cluster name: GSS.ebi.ac.uk >> GPFS cluster id: 17987981184946329605 >> GPFS UID domain: GSS.ebi.ac.uk >> Remote shell command: /usr/bin/ssh >> Remote file copy command: /usr/bin/scp >> >> GPFS cluster configuration servers: >> ----------------------------------- >> Primary server: gss01a.ebi.ac.uk >> Secondary server: gss02b.ebi.ac.uk >> >> Node Daemon node name IP address Admin node name Designation >> ----------------------------------------------------------------------- >> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >> >> >> Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different >> subnet because of datacenter constraints ( They are not physically >> in the same row, and due to network constraints was not possible to >> put them in the same subnet). The packets are routed, but should not >> be a problem as there is 160Gb/s bandwidth between them. >> >> Regards, >> Salvatore >> >> >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 07:40 AM >> Subject: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> hello all, >> could someone explain me the meaning of those waiters? >> >> gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> >> Does it means that the vdisk logs are struggling? >> >> Regards, >> Salvatore >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/ >> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com From oehmes at us.ibm.com Tue Oct 14 18:38:10 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 10:38:10 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> <543D51B6.3070602@ebi.ac.uk> Message-ID: i personally don't know, i am in GPFS Research, not in support :-) but have you tried to contact your sales rep ? if you are not successful with that, shoot me a direct email with details about your company name, country and customer number and i try to get you somebody to help. thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Zachary Giles To: gpfsug main discussion list Date: 10/14/2014 10:33 AM Subject: Re: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org Except that AFAIK no one has published how to update GSS or where the update code is.. All I've heard is "contact your sales rep". Any pointers? On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme wrote: > you basically run GSS 1.0 code , while in the current version is GSS 2.0 > (which replaced Version 1.5 2 month ago) > > GSS 1.5 and 2.0 have several enhancements in this space so i strongly > encourage you to upgrade your systems. > > if you can specify a bit what your workload is there might also be > additional knobs we can turn to change the behavior. > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM: > >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 09:40 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> Thanks in advance for your help. >> >> We have 6 RG: > >> recovery group vdisks vdisks servers >> ------------------ ----------- ------ ------- >> gss01a 4 8 >> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk >> gss01b 4 8 >> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk >> gss02a 4 8 >> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk >> gss02b 4 8 >> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk >> gss03a 4 8 >> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk >> gss03b 4 8 >> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk >> >> Check the attached file for RG details. >> Following mmlsconfig: > >> [root at gss01a ~]# mmlsconfig >> Configuration data for cluster GSS.ebi.ac.uk: >> --------------------------------------------- >> myNodeConfigNumber 1 >> clusterName GSS.ebi.ac.uk >> clusterId 17987981184946329605 >> autoload no >> dmapiFileHandleSize 32 >> minReleaseLevel 3.5.0.11 >> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] >> pagepool 38g >> nsdRAIDBufferPoolSizePct 80 >> maxBufferDescs 2m >> numaMemoryInterleave yes >> prefetchPct 5 >> maxblocksize 16m >> nsdRAIDTracks 128k >> ioHistorySize 64k >> nsdRAIDSmallBufferSize 256k >> nsdMaxWorkerThreads 3k >> nsdMinWorkerThreads 3k >> nsdRAIDSmallThreadRatio 2 >> nsdRAIDThreadsPerQueue 16 >> nsdClientCksumTypeLocal ck64 >> nsdClientCksumTypeRemote ck64 >> nsdRAIDEventLogToConsole all >> nsdRAIDFastWriteFSDataLimit 64k >> nsdRAIDFastWriteFSMetadataLimit 256k >> nsdRAIDReconstructAggressiveness 1 >> nsdRAIDFlusherBuffersLowWatermarkPct 20 >> nsdRAIDFlusherBuffersLimitPct 80 >> nsdRAIDFlusherTracksLowWatermarkPct 20 >> nsdRAIDFlusherTracksLimitPct 80 >> nsdRAIDFlusherFWLogHighWatermarkMB 1000 >> nsdRAIDFlusherFWLogLimitMB 5000 >> nsdRAIDFlusherThreadsLowWatermark 1 >> nsdRAIDFlusherThreadsHighWatermark 512 >> nsdRAIDBlockDeviceMaxSectorsKB 4096 >> nsdRAIDBlockDeviceNrRequests 32 >> nsdRAIDBlockDeviceQueueDepth 16 >> nsdRAIDBlockDeviceScheduler deadline >> nsdRAIDMaxTransientStale2FT 1 >> nsdRAIDMaxTransientStale3FT 1 >> syncWorkerThreads 256 >> tscWorkerPool 64 >> nsdInlineWriteMax 32k >> maxFilesToCache 12k >> maxStatCache 512 >> maxGeneralThreads 1280 >> flushedDataTarget 1024 >> flushedInodeTarget 1024 >> maxFileCleaners 1024 >> maxBufferCleaners 1024 >> logBufferCount 20 >> logWrapAmountPct 2 >> logWrapThreads 128 >> maxAllocRegionsPerNode 32 >> maxBackgroundDeletionThreads 16 >> maxInodeDeallocPrefetch 128 >> maxMBpS 16000 >> maxReceiverThreads 128 >> worker1Threads 1024 >> worker3Threads 32 >> [common] >> cipherList AUTHONLY >> socketMaxListenConnections 1500 >> failureDetectionTime 60 >> [common] >> adminMode central >> >> File systems in cluster GSS.ebi.ac.uk: >> -------------------------------------- >> /dev/gpfs1 > >> For more configuration paramenters i also attached a file with the >> complete output of mmdiag --config. >> >> >> and mmlsfs: >> >> File system attributes for /dev/gpfs1: >> ====================================== >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 32768 Minimum fragment size >> in bytes (system pool) >> 262144 Minimum fragment size >> in bytes (other pools) >> -i 512 Inode size in bytes >> -I 32768 Indirect block size in bytes >> -m 2 Default number of >> metadata replicas >> -M 2 Maximum number of >> metadata replicas >> -r 1 Default number of data >> replicas >> -R 2 Maximum number of data >> replicas >> -j scatter Block allocation type >> -D nfs4 File locking semantics in >> effect >> -k all ACL semantics in effect >> -n 1000 Estimated number of >> nodes that will mount file system >> -B 1048576 Block size (system pool) >> 8388608 Block size (other pools) >> -Q user;group;fileset Quotas enforced >> user;group;fileset Default quotas enabled >> --filesetdf no Fileset df enabled? >> -V 13.23 (3.5.0.7) File system version >> --create-time Tue Mar 18 16:01:24 2014 File system creation time >> -u yes Support for large LUNs? >> -z no Is DMAPI enabled? >> -L 4194304 Logfile size >> -E yes Exact mtime mount option >> -S yes Suppress atime mount option >> -K whenpossible Strict replica allocation >> option >> --fastea yes Fast external attributes >> enabled? >> --inode-limit 134217728 Maximum number of inodes >> -P system;data Disk storage pools in file >> system >> -d >> >> gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; >> -d >> >> gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; >> -d >> >> gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; >> -d >> >> gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; >> -d >> >> gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 >> Disks in file system >> --perfileset-quota no Per-fileset quota enforcement >> -A yes Automatic mount option >> -o none Additional mount options >> -T /gpfs1 Default mount point >> --mount-priority 0 Mount priority >> >> >> Regards, >> Salvatore >> > >> On 14/10/14 17:22, Sven Oehme wrote: >> your GSS code version is very backlevel. >> >> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk >> as well as mmlsconfig and mmlsfs all >> >> thx. Sven >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug-discuss at gpfsug.org >> Date: 10/14/2014 08:23 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> >> On 14/10/14 15:51, Sven Oehme wrote: >> it means there is contention on inserting data into the fast write >> log on the GSS Node, which could be config or workload related >> what GSS code version are you running >> [root at ebi5-251 ~]# mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "3.5.0-11 efix1 (888041)". >> Built on Jul 9 2013 at 18:03:32 >> Running 6 days 2 hours 10 minutes 35 secs >> >> >> >> and how are the nodes connected with each other (Ethernet or IB) ? >> ethernet. they use the same bonding (4x10Gb/s) where the data is >> passing. We don't have admin dedicated network >> >> [root at gss03a ~]# mmlscluster >> >> GPFS cluster information >> ======================== >> GPFS cluster name: GSS.ebi.ac.uk >> GPFS cluster id: 17987981184946329605 >> GPFS UID domain: GSS.ebi.ac.uk >> Remote shell command: /usr/bin/ssh >> Remote file copy command: /usr/bin/scp >> >> GPFS cluster configuration servers: >> ----------------------------------- >> Primary server: gss01a.ebi.ac.uk >> Secondary server: gss02b.ebi.ac.uk >> >> Node Daemon node name IP address Admin node name Designation >> ----------------------------------------------------------------------- >> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >> >> >> Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different >> subnet because of datacenter constraints ( They are not physically >> in the same row, and due to network constraints was not possible to >> put them in the same subnet). The packets are routed, but should not >> be a problem as there is 160Gb/s bandwidth between them. >> >> Regards, >> Salvatore >> >> >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 07:40 AM >> Subject: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> hello all, >> could someone explain me the meaning of those waiters? >> >> gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> >> Does it means that the vdisk logs are struggling? >> >> Regards, >> Salvatore >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/ >> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmcneil at kingston.ac.uk Wed Oct 15 14:01:49 2014 From: tmcneil at kingston.ac.uk (Mcneil, Tony) Date: Wed, 15 Oct 2014 14:01:49 +0100 Subject: [gpfsug-discuss] Hello Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705@KUMBX.kuds.kingston.ac.uk> Hello All, Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. So far we have migrated all our students and approximately 60% of our staff. Looking forward to receiving some interesting posts from the forum. Regards Tony. Tony McNeil Senior Systems Support Analyst, Infrastructure, Information Services ______________________________________________________________________________ T Internal: 62852 T 020 8417 2852 Kingston University London Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. Please consider the environment before printing this email. This email has been scanned for all viruses by the MessageLabs Email Security System. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bill.Pappas at STJUDE.ORG Thu Oct 16 14:49:57 2014 From: Bill.Pappas at STJUDE.ORG (Pappas, Bill) Date: Thu, 16 Oct 2014 08:49:57 -0500 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> Are you using ctdb? Thanks, Bill Pappas - Manager - Enterprise Storage Group Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 bill.pappas at stjude.org (901) 595-4549 office www.stjude.org Email disclaimer: http://www.stjude.org/emaildisclaimer -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org Sent: Thursday, October 16, 2014 6:00 AM To: gpfsug-discuss at gpfsug.org Subject: gpfsug-discuss Digest, Vol 33, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Hello (Mcneil, Tony) ---------------------------------------------------------------------- Message: 1 Date: Wed, 15 Oct 2014 14:01:49 +0100 From: "Mcneil, Tony" To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Hello Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> Content-Type: text/plain; charset="us-ascii" Hello All, Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. So far we have migrated all our students and approximately 60% of our staff. Looking forward to receiving some interesting posts from the forum. Regards Tony. Tony McNeil Senior Systems Support Analyst, Infrastructure, Information Services ______________________________________________________________________________ T Internal: 62852 T 020 8417 2852 Kingston University London Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. Please consider the environment before printing this email. This email has been scanned for all viruses by the MessageLabs Email Security System. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 33, Issue 19 ********************************************** From tmcneil at kingston.ac.uk Fri Oct 17 06:25:00 2014 From: tmcneil at kingston.ac.uk (Mcneil, Tony) Date: Fri, 17 Oct 2014 06:25:00 +0100 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) In-Reply-To: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> Hi Bill, Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel Regards Tony. -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill Sent: 16 October 2014 14:50 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Hello (Mcneil, Tony) Are you using ctdb? Thanks, Bill Pappas - Manager - Enterprise Storage Group Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 bill.pappas at stjude.org (901) 595-4549 office www.stjude.org Email disclaimer: http://www.stjude.org/emaildisclaimer -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org Sent: Thursday, October 16, 2014 6:00 AM To: gpfsug-discuss at gpfsug.org Subject: gpfsug-discuss Digest, Vol 33, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Hello (Mcneil, Tony) ---------------------------------------------------------------------- Message: 1 Date: Wed, 15 Oct 2014 14:01:49 +0100 From: "Mcneil, Tony" To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Hello Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> Content-Type: text/plain; charset="us-ascii" Hello All, Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. So far we have migrated all our students and approximately 60% of our staff. Looking forward to receiving some interesting posts from the forum. Regards Tony. Tony McNeil Senior Systems Support Analyst, Infrastructure, Information Services ______________________________________________________________________________ T Internal: 62852 T 020 8417 2852 Kingston University London Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. Please consider the environment before printing this email. This email has been scanned for all viruses by the MessageLabs Email Security System. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 33, Issue 19 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This email has been scanned for all viruses by the MessageLabs Email Security System. This email has been scanned for all viruses by the MessageLabs Email Security System. From chair at gpfsug.org Tue Oct 21 11:42:10 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Tue, 21 Oct 2014 11:42:10 +0100 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) In-Reply-To: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> Message-ID: <54463882.7070009@gpfsug.org> I noticed that v7000 Unified is using CTDB v3.3. What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. Is that a question for Amitay? On 17/10/14 06:25, Mcneil, Tony wrote: > Hi Bill, > > Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel > > Regards > Tony. > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill > Sent: 16 October 2014 14:50 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] Hello (Mcneil, Tony) > > Are you using ctdb? > > Thanks, > Bill Pappas - > Manager - Enterprise Storage Group > Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital > 262 Danny Thomas Place, Mail Stop 504 > Memphis, TN 38105 > bill.pappas at stjude.org > (901) 595-4549 office > www.stjude.org > Email disclaimer: http://www.stjude.org/emaildisclaimer > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org > Sent: Thursday, October 16, 2014 6:00 AM > To: gpfsug-discuss at gpfsug.org > Subject: gpfsug-discuss Digest, Vol 33, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Hello (Mcneil, Tony) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 15 Oct 2014 14:01:49 +0100 > From: "Mcneil, Tony" > To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Hello > Message-ID: > <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> > > Content-Type: text/plain; charset="us-ascii" > > Hello All, > > Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' > > We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. > > The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. > > So far we have migrated all our students and approximately 60% of our staff. > > Looking forward to receiving some interesting posts from the forum. > > Regards > Tony. > > Tony McNeil > Senior Systems Support Analyst, Infrastructure, Information Services > ______________________________________________________________________________ > > T Internal: 62852 > T 020 8417 2852 > > Kingston University London > Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk > > Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. > Please consider the environment before printing this email. > > > This email has been scanned for all viruses by the MessageLabs Email Security System. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 33, Issue 19 > ********************************************** > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From rtriendl at ddn.com Tue Oct 21 11:53:37 2014 From: rtriendl at ddn.com (Robert Triendl) Date: Tue, 21 Oct 2014 10:53:37 +0000 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) In-Reply-To: <54463882.7070009@gpfsug.org> References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> <54463882.7070009@gpfsug.org> Message-ID: Yes, I think so? I am;-) On 2014/10/21, at 19:42, Jez Tucker (Chair) wrote: > I noticed that v7000 Unified is using CTDB v3.3. > What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. > Is that a question for Amitay? > > On 17/10/14 06:25, Mcneil, Tony wrote: >> Hi Bill, >> >> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel >> >> Regards >> Tony. >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill >> Sent: 16 October 2014 14:50 >> To: gpfsug-discuss at gpfsug.org >> Subject: [gpfsug-discuss] Hello (Mcneil, Tony) >> >> Are you using ctdb? >> >> Thanks, >> Bill Pappas - >> Manager - Enterprise Storage Group >> Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital >> 262 Danny Thomas Place, Mail Stop 504 >> Memphis, TN 38105 >> bill.pappas at stjude.org >> (901) 595-4549 office >> www.stjude.org >> Email disclaimer: http://www.stjude.org/emaildisclaimer >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org >> Sent: Thursday, October 16, 2014 6:00 AM >> To: gpfsug-discuss at gpfsug.org >> Subject: gpfsug-discuss Digest, Vol 33, Issue 19 >> >> Send gpfsug-discuss mailing list submissions to >> gpfsug-discuss at gpfsug.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> or, via email, send a message with subject or body 'help' to >> gpfsug-discuss-request at gpfsug.org >> >> You can reach the person managing the list at >> gpfsug-discuss-owner at gpfsug.org >> >> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." >> >> >> Today's Topics: >> >> 1. Hello (Mcneil, Tony) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Wed, 15 Oct 2014 14:01:49 +0100 >> From: "Mcneil, Tony" >> To: "gpfsug-discuss at gpfsug.org" >> Subject: [gpfsug-discuss] Hello >> Message-ID: >> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> >> >> Content-Type: text/plain; charset="us-ascii" >> >> Hello All, >> >> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' >> >> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. >> >> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. >> >> So far we have migrated all our students and approximately 60% of our staff. >> >> Looking forward to receiving some interesting posts from the forum. >> >> Regards >> Tony. >> >> Tony McNeil >> Senior Systems Support Analyst, Infrastructure, Information Services >> ______________________________________________________________________________ >> >> T Internal: 62852 >> T 020 8417 2852 >> >> Kingston University London >> Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk >> >> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. >> Please consider the environment before printing this email. >> >> >> This email has been scanned for all viruses by the MessageLabs Email Security System. >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> End of gpfsug-discuss Digest, Vol 33, Issue 19 >> ********************************************** >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Bill.Pappas at STJUDE.ORG Tue Oct 21 16:59:08 2014 From: Bill.Pappas at STJUDE.ORG (Pappas, Bill) Date: Tue, 21 Oct 2014 10:59:08 -0500 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) (Jez Tucker (Chair)) Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE73664E3E8D@10.stjude.org> >>Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb. 1. What procedure did you follow to configure ctdb/samba to work? Was it hard? Could you show us, if permitted? 2. Are you also controlling NFS via ctdb? 3. Are you managing multiple IP devices? Eg: ethX0 for VLAN104 and ethX1 for VLAN103 (<- for fast 10GbE users). We use SoNAS and v7000 for most NAS and they use ctdb. Their ctdb results are overall 'ok', with a few bumps here or there. Not too many ctdb PMRs over the 3-4 years on SoNAS. We want to set up ctdb for a GPFS AFM cache that services GPSF data clients. That cache writes to an AFM home (SoNAS). This cache also uses Samba and NFS for lightweight (as in IO, though still important) file access on this cache. It does not use ctdb, but I know it should. I would love to learn how you set your environment up even if it may be a little (or a lot) different. Thanks, Bill Pappas - Manager - Enterprise Storage Group Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 bill.pappas at stjude.org (901) 595-4549 office www.stjude.org Email disclaimer: http://www.stjude.org/emaildisclaimer -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org Sent: Tuesday, October 21, 2014 6:00 AM To: gpfsug-discuss at gpfsug.org Subject: gpfsug-discuss Digest, Vol 33, Issue 21 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Hello (Mcneil, Tony) (Jez Tucker (Chair)) 2. Re: Hello (Mcneil, Tony) (Robert Triendl) ---------------------------------------------------------------------- Message: 1 Date: Tue, 21 Oct 2014 11:42:10 +0100 From: "Jez Tucker (Chair)" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony) Message-ID: <54463882.7070009 at gpfsug.org> Content-Type: text/plain; charset=windows-1252; format=flowed I noticed that v7000 Unified is using CTDB v3.3. What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. Is that a question for Amitay? On 17/10/14 06:25, Mcneil, Tony wrote: > Hi Bill, > > Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel > > Regards > Tony. > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill > Sent: 16 October 2014 14:50 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] Hello (Mcneil, Tony) > > Are you using ctdb? > > Thanks, > Bill Pappas - > Manager - Enterprise Storage Group > Sr. Enterprise Network Storage Architect Information Sciences > Department / Enterprise Informatics Division St. Jude Children's > Research Hospital > 262 Danny Thomas Place, Mail Stop 504 > Memphis, TN 38105 > bill.pappas at stjude.org > (901) 595-4549 office > www.stjude.org > Email disclaimer: http://www.stjude.org/emaildisclaimer > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of > gpfsug-discuss-request at gpfsug.org > Sent: Thursday, October 16, 2014 6:00 AM > To: gpfsug-discuss at gpfsug.org > Subject: gpfsug-discuss Digest, Vol 33, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Hello (Mcneil, Tony) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 15 Oct 2014 14:01:49 +0100 > From: "Mcneil, Tony" > To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Hello > Message-ID: > > <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.u > k> > > Content-Type: text/plain; charset="us-ascii" > > Hello All, > > Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' > > We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. > > The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. > > So far we have migrated all our students and approximately 60% of our staff. > > Looking forward to receiving some interesting posts from the forum. > > Regards > Tony. > > Tony McNeil > Senior Systems Support Analyst, Infrastructure, Information Services > ______________________________________________________________________ > ________ > > T Internal: 62852 > T 020 8417 2852 > > Kingston University London > Penrhyn Road, Kingston upon Thames KT1 2EE > www.kingston.ac.uk > > Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. > Please consider the environment before printing this email. > > > This email has been scanned for all viruses by the MessageLabs Email Security System. > -------------- next part -------------- An HTML attachment was > scrubbed... > URL: > bcf/attachment-0001.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 33, Issue 19 > ********************************************** > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ Message: 2 Date: Tue, 21 Oct 2014 10:53:37 +0000 From: Robert Triendl To: "chair at gpfsug.org" , gpfsug main discussion list Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony) Message-ID: Content-Type: text/plain; charset="Windows-1252" Yes, I think so? I am;-) On 2014/10/21, at 19:42, Jez Tucker (Chair) wrote: > I noticed that v7000 Unified is using CTDB v3.3. > What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. > Is that a question for Amitay? > > On 17/10/14 06:25, Mcneil, Tony wrote: >> Hi Bill, >> >> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel >> >> Regards >> Tony. >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org >> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill >> Sent: 16 October 2014 14:50 >> To: gpfsug-discuss at gpfsug.org >> Subject: [gpfsug-discuss] Hello (Mcneil, Tony) >> >> Are you using ctdb? >> >> Thanks, >> Bill Pappas - >> Manager - Enterprise Storage Group >> Sr. Enterprise Network Storage Architect Information Sciences >> Department / Enterprise Informatics Division St. Jude Children's >> Research Hospital >> 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 >> bill.pappas at stjude.org >> (901) 595-4549 office >> www.stjude.org >> Email disclaimer: http://www.stjude.org/emaildisclaimer >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org >> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of >> gpfsug-discuss-request at gpfsug.org >> Sent: Thursday, October 16, 2014 6:00 AM >> To: gpfsug-discuss at gpfsug.org >> Subject: gpfsug-discuss Digest, Vol 33, Issue 19 >> >> Send gpfsug-discuss mailing list submissions to >> gpfsug-discuss at gpfsug.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> or, via email, send a message with subject or body 'help' to >> gpfsug-discuss-request at gpfsug.org >> >> You can reach the person managing the list at >> gpfsug-discuss-owner at gpfsug.org >> >> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." >> >> >> Today's Topics: >> >> 1. Hello (Mcneil, Tony) >> >> >> --------------------------------------------------------------------- >> - >> >> Message: 1 >> Date: Wed, 15 Oct 2014 14:01:49 +0100 >> From: "Mcneil, Tony" >> To: "gpfsug-discuss at gpfsug.org" >> Subject: [gpfsug-discuss] Hello >> Message-ID: >> >> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac. >> uk> >> >> Content-Type: text/plain; charset="us-ascii" >> >> Hello All, >> >> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' >> >> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. >> >> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. >> >> So far we have migrated all our students and approximately 60% of our staff. >> >> Looking forward to receiving some interesting posts from the forum. >> >> Regards >> Tony. >> >> Tony McNeil >> Senior Systems Support Analyst, Infrastructure, Information Services >> _____________________________________________________________________ >> _________ >> >> T Internal: 62852 >> T 020 8417 2852 >> >> Kingston University London >> Penrhyn Road, Kingston upon Thames KT1 2EE >> www.kingston.ac.uk >> >> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. >> Please consider the environment before printing this email. >> >> >> This email has been scanned for all viruses by the MessageLabs Email Security System. >> -------------- next part -------------- An HTML attachment was >> scrubbed... >> URL: >> > 8bcf/attachment-0001.html> >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> End of gpfsug-discuss Digest, Vol 33, Issue 19 >> ********************************************** >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 33, Issue 21 ********************************************** From bbanister at jumptrading.com Thu Oct 23 19:35:45 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Oct 2014 18:35:45 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> I reviewed my RFE request again and notice that it has been marked as ?Private? and I think this is preventing people from voting on this RFE. I have talked to others that would like to vote for this RFE. How can I set the RFE to public so that others may vote on it? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister Sent: Friday, October 10, 2014 12:13 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted. I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable. I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!) Thanks, -Bryan Ps. Hajo said that he couldn?t access the RFE to vote on it: I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo Here is what the RFE website states: Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 A unique URL that you can bookmark and share with others. From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Friday, October 10, 2014 11:52 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > wrote: I agree with Ben, I think. I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Oct 23 19:50:21 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Oct 2014 18:50:21 +0000 Subject: [gpfsug-discuss] GPFS User Group at SC14 Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C68@CHI-EXCHANGEW2.w2k.jumptrading.com> I'm going to be attending the GPFS User Group at SC14 this year. Here is basic agenda that was provided: GPFS/Elastic Storage User Group Monday, November 17, 2014 3:00 PM-5:00 PM: GPFS/Elastic Storage User Group [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] IBM Software Defined Storage strategy update [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Customer presentations [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Future directions such as object storage and OpenStack integration [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Elastic Storage server update [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Elastic Storage roadmap (*NDA required) 5:00 PM: Reception Conference room location provided upon registration. *Attendees must sign a non-disclosure agreement upon arrival or as provided in advance. I think it would be great to review the submitted RFEs and give the user group the chance to vote on them to help promote the RFEs that we care about most. I would also really appreciate any additional details regarding the new GPFS 4.1 deadlock detection facility and any recommended best practices around this new feature. Thanks! -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 76 bytes Desc: image001.gif URL: From chair at gpfsug.org Thu Oct 23 19:52:07 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Thu, 23 Oct 2014 19:52:07 +0100 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <54494E57.90304@gpfsug.org> Hi Bryan Unsure, to be honest. When I added all the GPFS UG RFEs in, I didn't see an option to make the RFE private. There's private fields, but not a 'make this RFE private' checkbox or such. This one may be better directed to the GPFS developer forum / redo the RFE. RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll be feeding info back to the group. Jez On 23/10/14 19:35, Bryan Banister wrote: > > I reviewed my RFE request again and notice that it has been marked as > ?Private? and I think this is preventing people from voting on this > RFE. I have talked to others that would like to vote for this RFE. > > How can I set the RFE to public so that others may vote on it? > > Thanks! > > -Bryan > > *From:*gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Bryan Banister > *Sent:* Friday, October 10, 2014 12:13 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > A DMAPI daemon solution puts a dependency on the DMAPI daemon for the > file system to be mounted. I think it would be better to have > something like what I requested in the RFE that would hopefully not > have this dependency, and would be optional/configurable. I?m sure we > would all prefer something that is supported directly by IBM (hence > the RFE!) > > Thanks, > > -Bryan > > Ps. Hajo said that he couldn?t access the RFE to vote on it: > > I would like to support the RFE but i get: > > "You cannot access this page because you do not have the proper > authority." > > Cheers > > Hajo > > Here is what the RFE website states: > > Bookmarkable > URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 > A unique URL that you can bookmark and share with others. > > *From:*gpfsug-discuss-bounces at gpfsug.org > > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Sven Oehme > *Sent:* Friday, October 10, 2014 11:52 AM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > The only DMAPI agent i am aware of is a prototype that was written by > tridge in 2008 to demonstrate a file based HSM system for GPFS. > > its a working prototype, at least it worked in 2008 :-) > > you can get the source code from git : > > http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary > > just to be clear, there is no Support for this code. we obviously > Support the DMAPI interface , but the code that exposes the API is > nothing we provide Support for. > > thx. Sven > > On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > > wrote: > > I agree with Ben, I think. > > I don?t want to use the ILM policy engine as that puts a direct > workload against the metadata storage and server resources. We need > something out-of-band, out of the file system operational path. > > Is there a simple DMAPI daemon that would log the file system > namespace changes that we could use? > > If so are there any limitations? > > And is it possible to set this up in an HA environment? > > Thanks! > > -Bryan > > *From:*gpfsug-discuss-bounces at gpfsug.org > > [mailto:gpfsug-discuss-bounces at gpfsug.org > ] *On Behalf Of *Ben De Luca > *Sent:* Friday, October 10, 2014 11:10 AM > > > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > querying this through the policy engine is far to late to do any thing > useful with it > > On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: > > Ben, > > to get lists of 'Hot Files' turn File Heat on , some discussion about > it is here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 > > thx. Sven > > On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: > > Id like this to see hot files > > On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > > wrote: > > Hmm... I didn't think to use the DMAPI interface. That could be a > nice option. Has anybody done this already and are there any examples > we could look at? > > Thanks! > -Bryan > > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org > > [mailto:gpfsug-discuss-bounces at gpfsug.org > ] On Behalf Of Phil Pishioneri > Sent: Friday, October 10, 2014 10:04 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS RFE promotion > > On 10/9/14 3:31 PM, Bryan Banister wrote: > > > > Just wanted to pass my GPFS RFE along: > > > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > > 0458 > > > > > > *Description*: > > > > GPFS File System Manager should provide the option to log all file and > > directory operations that occur in a file system, preferably stored in > > a TSD (Time Series Database) that could be quickly queried through an > > API interface and command line tools. ... > > > > The rudimentaries for this already exist via the DMAPI interface in > GPFS (used by the TSM HSM product). A while ago this was posted to the > IBM GPFS DeveloperWorks forum: > > On 1/3/11 10:27 AM, dWForums wrote: > > Author: > > AlokK.Dhir > > > > Message: > > We have a proof of concept which uses DMAPI to listens to and > passively logs filesystem changes with a non blocking listener. This > log can be used to generate backup sets etc. Unfortunately, a bug in > the current DMAPI keeps this approach from working in the case of > certain events. I am told 3.4.0.3 may contain a fix. We will gladly > share the code once it is working. > > -Phil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ------------------------------------------------------------------------ > > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ------------------------------------------------------------------------ > > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > > ------------------------------------------------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Oct 23 19:59:52 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Oct 2014 18:59:52 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <54494E57.90304@gpfsug.org> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> <54494E57.90304@gpfsug.org> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C98@CHI-EXCHANGEW2.w2k.jumptrading.com> Looks like IBM decides if the RFE is public or private: Q: What are private requests? A: Private requests are requests that can be viewed only by IBM, the request author, members of a group with the request in its watchlist, and users with the request in their watchlist. Only the author of the request can add a private request to their watchlist or a group watchlist. Private requests appear in various public views, such as Top 20 watched or Planned requests; however, only limited information about the request will be displayed. IBM determines the default request visibility of a request, either public or private, and IBM may change the request visibility at any time. If you are watching a request and have subscribed to email notifications, you will be notified if the visibility of the request changes. I'm submitting a request to make the RFE public so that others may vote on it now, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jez Tucker (Chair) Sent: Thursday, October 23, 2014 1:52 PM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] GPFS RFE promotion Hi Bryan Unsure, to be honest. When I added all the GPFS UG RFEs in, I didn't see an option to make the RFE private. There's private fields, but not a 'make this RFE private' checkbox or such. This one may be better directed to the GPFS developer forum / redo the RFE. RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll be feeding info back to the group. Jez On 23/10/14 19:35, Bryan Banister wrote: I reviewed my RFE request again and notice that it has been marked as "Private" and I think this is preventing people from voting on this RFE. I have talked to others that would like to vote for this RFE. How can I set the RFE to public so that others may vote on it? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister Sent: Friday, October 10, 2014 12:13 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted. I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable. I'm sure we would all prefer something that is supported directly by IBM (hence the RFE!) Thanks, -Bryan Ps. Hajo said that he couldn't access the RFE to vote on it: I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo Here is what the RFE website states: Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 A unique URL that you can bookmark and share with others. From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Friday, October 10, 2014 11:52 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > wrote: I agree with Ben, I think. I don't want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 24 19:58:07 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 24 Oct 2014 18:58:07 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB94C513@CHI-EXCHANGEW2.w2k.jumptrading.com> It is with humble apology and great relief that I was wrong about the AFM limitation that I believed existed in the configuration I explained below. The problem that I had with my configuration is that the NSD client cluster was not completely updated to GPFS 4.1.0-3, as there are a few nodes still running 3.5.0-20 in the cluster which currently prevents upgrading the GPFS file system release version (e.g. mmchconfig release=LATEST) to 4.1.0-3. This GPFS configuration ?requirement? isn?t documented in the Advanced Admin Guide, but it makes sense that this is required since only the GPFS 4.1 release supports the GPFS protocol for AFM fileset targets. I have tested the configuration with a new NSD Client cluster and the configuration works as desired. Thanks Kalyan and others for their feedback. Our file system namespace is unfortunately filled with small files that do not allow AFM to parallelize the data transfers across multiple nodes. And unfortunately AFM will only allow one Gateway node per fileset to perform the prefetch namespace scan operation, which is incredibly slow as I stated before. We were only seeing roughly 100 x " Queue numExec" operations per second. I think this performance is gated by the directory namespace scan of the single gateway node. Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 10:21 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations some clarifications inline: Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister > To: gpfsug main discussion list > Date: 10/07/2014 08:12 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org Interesting that AFM is supposed to work in a multi-cluster environment. We were using GPFS on the backend. The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount. The "gateway" nodes in the new cluster mounted the old file system. All systems were connected over the same QDR IB fabric. The client compute nodes in the third cluster mounted both the old and new file systems. I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed. I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable. There were no error messages in any GPFS or system logs related to the `ls` fail. NFS clients accessing cNFS servers in the new cluster also worked as expected. The `ls` from the NFS client in an AFM fileset returned the expected directory listing. Thus all symptoms indicated the configuration wasn't supported. I may try to replicate the problem in a test environment at some point. However AFM isn't really a great solution for file data migration between file systems for these reasons: 1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data). No way to have it simply keep the two namespaces in sync. And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system. This is concerning and it is not easy to detect that this event has occurred. --> The LU mode is meant for scenarios where changes in cache are not --> meant to be pushed back to old filesystem. If thats not whats desired then other AFM modes like IW can be used to keep namespace in sync and data can flow from both sides. Typically, for data migration --metadata-only to pull in the full namespace first and data can be migrated on demand or via policy as outlined above using prefetch cmd. AFM setup should be extension to GPFS multi-cluster setup when using GPFS backend. 2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME! And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES! --> I am not sure I follow the first downtime. If applications have to start using the new filesystem, then they have to be informed accordingly. If this can be done without bringing down applications, then there is no DOWNTIME. Regarding, second downtime, you are right, disabling AFM after data migration requires unlink and hence downtime. But there is a easy workaround, where revalidation intervals can be increased to max or GW nodes can be unconfigured without downtime with same effect. And disabling AFM can be done at a later point during maintenance window. We plan to modify this to have this done online aka without requiring unlink of the fileset. This will get prioritized if there is enough interest in AFM being used in this direction. 3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration. Multiple threads from a single node just doesn't cut it due to single node bandwidth limits. When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second. The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list! --> Prefetch can run on multiple nodes by configuring multiple GW nodes --> and enabling parallel i/o as specified in the docs..link provided below. Infact it can parallelize data xfer to a single file and also do multiple files in parallel depending on filesizes and various tuning params. 4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration. Yes, there is of course efforts to breakup the namespace for each rsync operations. This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired. --> AFM can be used for data migration without any downtime dictated by --> AFM (see above) and it can infact use multiple threads on multiple nodes to do parallel i/o. AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN. Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 12:03 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister > To: gpfsug main discussion list > Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister > wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Wed Oct 29 13:59:40 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Wed, 29 Oct 2014 13:59:40 +0000 Subject: [gpfsug-discuss] Storagebeers, Nov 13th Message-ID: <5450F2CC.3070302@gpfsug.org> Hello all, I just thought I'd make you all aware of a social, #storagebeers on Nov 13th organised by Martin Glassborow, one of our UG members. http://www.gpfsug.org/2014/10/29/storagebeers-13th-nov/ I'll be popping along. Hopefully see you there. Jez From Jared.Baker at uwyo.edu Wed Oct 29 15:31:31 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 15:31:31 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings Message-ID: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 16:33:22 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 16:33:22 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <1414600402.24518.216.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-10-29 at 15:31 +0000, Jared David Baker wrote: [SNIP] > I?m wondering if somebody has seen this type of issue before? Will > recreating my NSDs destroy the filesystem? I?m thinking that all the > data is intact, but there is no crucial data on this file system yet, > so I could recreate the file system, but I would like to learn how to > solve a problem like this. Thanks for all help and information. > At an educated guess and assuming the disks are visible to the OS (try dd'ing the first few GB to /dev/null) it looks like you have managed at some point to wipe the NSD descriptors from the disks - ouch. The file system will continue to work after this has been done, but if you start rebooting the NSD servers you will find after the last one has been restarted the file system is unmountable. Simply unmounting the file systems from each NDS server is also probably enough. For good measure unless you have a backup of the NSD descriptors somewhere it is also an unrecoverable condition. Lucky for you if there is nothing on it that matters. My suggestion is re-examine what you did during the firmware upgrade, as that is the most likely culprit. However bear in mind that it could have been days or even weeks ago that it occurred. I would raise a PMR to be sure, but it looks to me like you will be recreating the file system from scratch. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From oehmes at gmail.com Wed Oct 29 16:42:26 2014 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 29 Oct 2014 09:42:26 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Hello, there are multiple reasons why the descriptors can not be found . there was a recent change in firmware behaviors on multiple servers that restore the GPT table from a disk if the disk was used as a OS disk before used as GPFS disks. some infos here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e if thats the case there is a procedure to restore them. it could also be something very trivial , e.g. that your multipath mapping changed and your nsddevice file actually just prints out devices instead of scanning them and create a list on the fly , so GPFS ignores the new path to the disks. in any case , opening a PMR and work with Support is the best thing to do before causing any more damage. if the file-system is still mounted don't unmount it under any circumstances as Support needs to extract NSD descriptor information from it to restore them easily. Sven On Wed, Oct 29, 2014 at 8:31 AM, Jared David Baker wrote: > Hello all, > > > > I?m hoping that somebody can shed some light on a problem that I > experienced yesterday. I?ve been working with GPFS for a couple months as > an admin now, but I?ve come across a problem that I?m unable to see the > answer to. Hopefully the solution is not listed somewhere blatantly on the > web, but I spent a fair amount of time looking last night. Here is the > situation: yesterday, I needed to update some firmware on a Mellanox HCA > FDR14 card and reboot one of our GPFS servers and repeat for the sister > node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, > upon reboot, the server seemed to lose the path mappings to the multipath > devices for the NSDs. Output below: > > > > -- > > [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch > > > > Disk name NSD volume ID Device Node name > Remarks > > > --------------------------------------------------------------------------------------- > > dcs3800u31a_lun0 0A62001B54235577 - > mminsd5.infini (not found) server node > > dcs3800u31a_lun0 0A62001B54235577 - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - > mminsd5.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - > mminsd6.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - > mminsd6.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - > mminsd5.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini > (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - > mminsd6.infini (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - > mminsd5.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - > mminsd6.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - > mminsd6.infini (not found) server node > > > > [root at mmmnsd5 ~]# > > -- > > > > Also, the system was working fantastically before the reboot, but now I?m > unable to mount the GPFS filesystem. The disk names look like they are > there and mapped to the NSD volume ID, but there is no Device. I?ve created > the /var/mmfs/etc/nsddevices script and it has the following output with > user return 0: > > > > -- > > [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices > > mapper/dcs3800u31a_lun0 dmm > > mapper/dcs3800u31a_lun10 dmm > > mapper/dcs3800u31a_lun2 dmm > > mapper/dcs3800u31a_lun4 dmm > > mapper/dcs3800u31a_lun6 dmm > > mapper/dcs3800u31a_lun8 dmm > > mapper/dcs3800u31b_lun1 dmm > > mapper/dcs3800u31b_lun11 dmm > > mapper/dcs3800u31b_lun3 dmm > > mapper/dcs3800u31b_lun5 dmm > > mapper/dcs3800u31b_lun7 dmm > > mapper/dcs3800u31b_lun9 dmm > > [root at mmmnsd5 ~]# > > -- > > > > That output looks correct to me based on the documentation. So I went > digging in the GPFS log file and found this relevant information: > > > > -- > > Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No > such NSD locally found. > > Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No > such NSD locally found. > > Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. > No such NSD locally found. > > Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. > No such NSD locally found. > > -- > > > > Okay, so the NSDs don?t seem to be able to be found, so I attempt to > rediscover the NSD by executing the command mmnsddiscover: > > > > -- > > [root at mmmnsd5 ~]# mmnsddiscover > > mmnsddiscover: Attempting to rediscover the disks. This may take a while > ... > > mmnsddiscover: Finished. > > [root at mmmnsd5 ~]# > > -- > > > > I was hoping that finished, but then upon restarting GPFS, there was no > success. Verifying with mmlsnsd -X -f gscratch > > > > -- > > [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch > > > > Disk name NSD volume ID Device Devtype Node > name Remarks > > > --------------------------------------------------------------------------------------------------- > > dcs3800u31a_lun0 0A62001B54235577 - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun0 0A62001B54235577 - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - - > mminsd6.infini (not found) server node > > > > [root at mmmnsd5 ~]# > > -- > > > > I?m wondering if somebody has seen this type of issue before? Will > recreating my NSDs destroy the filesystem? I?m thinking that all the data > is intact, but there is no crucial data on this file system yet, so I could > recreate the file system, but I would like to learn how to solve a problem > like this. Thanks for all help and information. > > > > Regards, > > > > Jared > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Wed Oct 29 16:46:35 2014 From: oester at gmail.com (Bob Oesterlin) Date: Wed, 29 Oct 2014 11:46:35 -0500 Subject: [gpfsug-discuss] GPFS 4.1 event "deadlockOverload" Message-ID: I posted this to developerworks, but haven't seen a response. This is NOT the same event "deadlockDetected" that is documented in the 4.1 Probelm Determination Guide. I see these errors -in my mmfslog on the cluster master. I just upgraded to 4.1, and I can't find this documented anywhere. What is "event deadlockOverload" ? And what script would it call? The nodes in question are part of a CNFS group. Mon Oct 27 10:11:08.848 2014: [I] Received overload notification request from 10.30.42.30 to forward to all nodes in cluster XXX Mon Oct 27 10:11:08.849 2014: [I] Calling User Exit Script gpfsNotifyOverload: event deadlockOverload, Async command /usr/lpp/mmfs/bin/mmcommon. Mon Oct 27 10:11:14.478 2014: [I] Received overload notification request from 10.30.42.26 to forward to all nodes in cluster XXX Mon Oct 27 10:11:58.869 2014: [I] Received overload notification request from 10.30.42.30 to forward to all nodes in cluster XXX Mon Oct 27 10:11:58.870 2014: [I] Calling User Exit Script gpfsNotifyOverload: event deadlockOverload, Async command /usr/lpp/mmfs/bin/mmcommon. Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 17:19:14 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 17:19:14 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > Hello, > > > there are multiple reasons why the descriptors can not be found . > > > there was a recent change in firmware behaviors on multiple servers > that restore the GPT table from a disk if the disk was used as a OS > disk before used as GPFS disks. some infos > here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > if thats the case there is a procedure to restore them. I have been categorically told by IBM in no uncertain terms if the NSD descriptors have *ALL* been wiped then it is game over for that file system; restore from backup is your only option. If the GPT table has been "restored" and overwritten the NSD descriptors then you are hosed. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From oehmes at gmail.com Wed Oct 29 17:22:30 2014 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 29 Oct 2014 10:22:30 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> Message-ID: if you still have a running system you can extract the information and recreate the descriptors. if your sytem is already down, this is not possible any more. which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help . Sven On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard wrote: > On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > > Hello, > > > > > > there are multiple reasons why the descriptors can not be found . > > > > > > there was a recent change in firmware behaviors on multiple servers > > that restore the GPT table from a disk if the disk was used as a OS > > disk before used as GPFS disks. some infos > > here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > > > > if thats the case there is a procedure to restore them. > > I have been categorically told by IBM in no uncertain terms if the NSD > descriptors have *ALL* been wiped then it is game over for that file > system; restore from backup is your only option. > > If the GPT table has been "restored" and overwritten the NSD descriptors > then you are hosed. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 17:29:09 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 17:29:09 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> Message-ID: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote: > if you still have a running system you can extract the information and > recreate the descriptors. We had a running system with the file system still mounted on some nodes but all the NSD descriptors wiped, and I repeat where categorically told by IBM that nothing could be done and to restore the file system from backup. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Jared.Baker at uwyo.edu Wed Oct 29 17:30:00 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 17:30:00 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> Message-ID: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com> Thanks for all the information. I?m not exactly sure what happened during the firmware update of the HCAs (another admin). But I do have all the stanza files that I used to create the NSDs. Possible to utilize them to just regenerate the NSDs or is it consensus that the FS is gone? As the system was not in production (yet) I?ve got no problem delaying the release and running some tests to verify possible fixes. The system was already unmounted, so it is a completely inactive FS across the cluster. Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 11:23 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings if you still have a running system you can extract the information and recreate the descriptors. if your sytem is already down, this is not possible any more. which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help . Sven On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard > wrote: On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > Hello, > > > there are multiple reasons why the descriptors can not be found . > > > there was a recent change in firmware behaviors on multiple servers > that restore the GPT table from a disk if the disk was used as a OS > disk before used as GPFS disks. some infos > here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > if thats the case there is a procedure to restore them. I have been categorically told by IBM in no uncertain terms if the NSD descriptors have *ALL* been wiped then it is game over for that file system; restore from backup is your only option. If the GPT table has been "restored" and overwritten the NSD descriptors then you are hosed. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Oct 29 17:45:38 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 10:45:38 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Jared, if time permits i would open a PMR to check what happened. as i stated in my first email it could be multiple things, the GPT restore is only one possible of many explanations and some more simple reasons could explain what you see as well. get somebody from support check the state and then we know for sure. it would give you also peace of mind that it doesn't happen again when you are in production. if you feel its not worth and you don't wipe any important information start over again. btw. the newer BIOS versions of IBM servers have a option from preventing the GPT issue from happening : [root at gss02n1 ~]# asu64 showvalues DiskGPTRecovery.DiskGPTRecovery IBM Advanced Settings Utility version 9.61.85B Licensed Materials - Property of IBM (C) Copyright IBM Corp. 2007-2014 All Rights Reserved IMM LAN-over-USB device 0 enabled successfully. Successfully discovered the IMM via SLP. Discovered IMM at IP address 169.254.95.118 Connected to IMM at IP address 169.254.95.118 DiskGPTRecovery.DiskGPTRecovery=None= if you set it the GPT will never get restored. you would have to set this on all the nodes that have access to the disks. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 10:30 AM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks for all the information. I?m not exactly sure what happened during the firmware update of the HCAs (another admin). But I do have all the stanza files that I used to create the NSDs. Possible to utilize them to just regenerate the NSDs or is it consensus that the FS is gone? As the system was not in production (yet) I?ve got no problem delaying the release and running some tests to verify possible fixes. The system was already unmounted, so it is a completely inactive FS across the cluster. Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 11:23 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings if you still have a running system you can extract the information and recreate the descriptors. if your sytem is already down, this is not possible any more. which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help . Sven On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard wrote: On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > Hello, > > > there are multiple reasons why the descriptors can not be found . > > > there was a recent change in firmware behaviors on multiple servers > that restore the GPT table from a disk if the disk was used as a OS > disk before used as GPFS disks. some infos > here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > if thats the case there is a procedure to restore them. I have been categorically told by IBM in no uncertain terms if the NSD descriptors have *ALL* been wiped then it is game over for that file system; restore from backup is your only option. If the GPT table has been "restored" and overwritten the NSD descriptors then you are hosed. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Wed Oct 29 18:57:28 2014 From: ewahl at osc.edu (Ed Wahl) Date: Wed, 29 Oct 2014 18:57:28 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> , <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk> Message-ID: SOBAR is your friend at that point? Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jonathan Buzzard [jonathan at buzzard.me.uk] Sent: Wednesday, October 29, 2014 1:29 PM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Server lost NSD mappings On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote: > if you still have a running system you can extract the information and > recreate the descriptors. We had a running system with the file system still mounted on some nodes but all the NSD descriptors wiped, and I repeat where categorically told by IBM that nothing could be done and to restore the file system from backup. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ewahl at osc.edu Wed Oct 29 19:07:34 2014 From: ewahl at osc.edu (Ed Wahl) Date: Wed, 29 Oct 2014 19:07:34 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I?m hoping that somebody can shed some light on a problem that I experienced yesterday. I?ve been working with GPFS for a couple months as an admin now, but I?ve come across a problem that I?m unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I?m unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I?ve created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don?t seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I?m wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I?m thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared From Jared.Baker at uwyo.edu Wed Oct 29 19:27:26 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 19:27:26 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at us.ibm.com Wed Oct 29 19:41:22 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 12:41:22 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jared.Baker at uwyo.edu Wed Oct 29 19:46:23 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 19:46:23 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Oct 29 20:02:53 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 13:02:53 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jared.Baker at uwyo.edu Wed Oct 29 20:13:06 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 20:13:06 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Apologies Sven, w/o comments below: -- #!/bin/ksh CONTROLLER_REGEX='[ab]_lun[0-9]+' for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX ) do echo mapper/$dev dmm #echo mapper/$dev generic done # Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover), return 0 -- Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Oct 29 20:25:10 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 13:25:10 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Hi, based on what i see is your BIOS or FW update wiped the NSD descriptor by restoring a GPT table on the start of a disk that shouldn't have a GPT table to begin with as its under control of GPFS. future releases of GPFS prevent this by writing our own GPT label to the disks so other tools don't touch them, but that doesn't help in your case any more. if you want this officially confirmed i would still open a PMR, but at that point given that you don't seem to have any production data on it from what i see in your response you should recreate the filesystem. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 01:13 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Apologies Sven, w/o comments below: -- #!/bin/ksh CONTROLLER_REGEX='[ab]_lun[0-9]+' for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX ) do echo mapper/$dev dmm #echo mapper/$dev generic done # Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover), return 0 -- Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jared.Baker at uwyo.edu Wed Oct 29 20:30:29 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 20:30:29 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Thanks Sven, I appreciate the feedback. I'll be opening the PMR soon. Again, thanks for the information. Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, based on what i see is your BIOS or FW update wiped the NSD descriptor by restoring a GPT table on the start of a disk that shouldn't have a GPT table to begin with as its under control of GPFS. future releases of GPFS prevent this by writing our own GPT label to the disks so other tools don't touch them, but that doesn't help in your case any more. if you want this officially confirmed i would still open a PMR, but at that point given that you don't seem to have any production data on it from what i see in your response you should recreate the filesystem. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 01:13 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Apologies Sven, w/o comments below: -- #!/bin/ksh CONTROLLER_REGEX='[ab]_lun[0-9]+' for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX ) do echo mapper/$dev dmm #echo mapper/$dev generic done # Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover), return 0 -- Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 20:32:25 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 20:32:25 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <54514ED9.9030604@buzzard.me.uk> On 29/10/14 20:25, Sven Oehme wrote: > Hi, > > based on what i see is your BIOS or FW update wiped the NSD descriptor > by restoring a GPT table on the start of a disk that shouldn't have a > GPT table to begin with as its under control of GPFS. > future releases of GPFS prevent this by writing our own GPT label to the > disks so other tools don't touch them, but that doesn't help in your > case any more. if you want this officially confirmed i would still open > a PMR, but at that point given that you don't seem to have any > production data on it from what i see in your response you should > recreate the filesystem. > However before recreating the file system I would run the script to see if your disks have the secondary copy of the GPT partition table and if they do make sure it is wiped/removed *BEFORE* you go any further. Otherwise it could happen again... JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Jared.Baker at uwyo.edu Wed Oct 29 20:47:51 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 20:47:51 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <54514ED9.9030604@buzzard.me.uk> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> <54514ED9.9030604@buzzard.me.uk> Message-ID: Jonathan, which script are you talking about? Thanks, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jonathan Buzzard Sent: Wednesday, October 29, 2014 2:32 PM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Server lost NSD mappings On 29/10/14 20:25, Sven Oehme wrote: > Hi, > > based on what i see is your BIOS or FW update wiped the NSD descriptor > by restoring a GPT table on the start of a disk that shouldn't have a > GPT table to begin with as its under control of GPFS. > future releases of GPFS prevent this by writing our own GPT label to the > disks so other tools don't touch them, but that doesn't help in your > case any more. if you want this officially confirmed i would still open > a PMR, but at that point given that you don't seem to have any > production data on it from what i see in your response you should > recreate the filesystem. > However before recreating the file system I would run the script to see if your disks have the secondary copy of the GPT partition table and if they do make sure it is wiped/removed *BEFORE* you go any further. Otherwise it could happen again... JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Wed Oct 29 21:01:06 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 21:01:06 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> <54514ED9.9030604@buzzard.me.uk> Message-ID: <54515592.4050606@buzzard.me.uk> On 29/10/14 20:47, Jared David Baker wrote: > Jonathan, which script are you talking about? > The one here https://www.ibm.com/developerworks/community/forums/html/topic?id=32296bac-bfa1-45ff-9a43-08b0a36b17ef&ps=25 Use for detecting and clearing that secondary GPT table. Never used it of course, my disaster was caused by an idiot admin installing a new OS not mapping the disks out and then hit yes yes yes when asked if he wanted to blank the disks, the RHEL installer duly obliged. Then five days later I rebooted the last NSD server for an upgrade and BOOM 50TB and 80 million files down the swanny. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From mark.bergman at uphs.upenn.edu Fri Oct 31 17:10:55 2014 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Fri, 31 Oct 2014 13:10:55 -0400 Subject: [gpfsug-discuss] mapping to hostname? Message-ID: <25152-1414775455.156309@Pc2q.WYui.XCNm> Many GPFS logs & utilities refer to nodes via their name. I haven't found an "mm*" executable that shows the mapping between that name an the hostname. Is there a simple method to map the designation to the node's hostname? Thanks, Mark From bevans at pixitmedia.com Fri Oct 31 17:32:45 2014 From: bevans at pixitmedia.com (Barry Evans) Date: Fri, 31 Oct 2014 17:32:45 +0000 Subject: [gpfsug-discuss] mapping to hostname? In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm> References: <25152-1414775455.156309@Pc2q.WYui.XCNm> Message-ID: <5453C7BD.8030608@pixitmedia.com> I'm sure there is a better way to do this, but old habits die hard. I tend to use 'mmfsadm saferdump tscomm' - connection details should be littered throughout. Cheers, Barry ArcaStream/Pixit Media mark.bergman at uphs.upenn.edu wrote: > Many GPFS logs& utilities refer to nodes via their name. > > I haven't found an "mm*" executable that shows the mapping between that > name an the hostname. > > Is there a simple method to map the designation to the node's > hostname? > > Thanks, > > Mark > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. From oehmes at us.ibm.com Fri Oct 31 18:20:40 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Fri, 31 Oct 2014 11:20:40 -0700 Subject: [gpfsug-discuss] mapping to hostname? In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm> References: <25152-1414775455.156309@Pc2q.WYui.XCNm> Message-ID: Hi, the official way to do this is mmdiag --network thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: mark.bergman at uphs.upenn.edu To: gpfsug main discussion list Date: 10/31/2014 10:11 AM Subject: [gpfsug-discuss] mapping to hostname? Sent by: gpfsug-discuss-bounces at gpfsug.org Many GPFS logs & utilities refer to nodes via their name. I haven't found an "mm*" executable that shows the mapping between that name an the hostname. Is there a simple method to map the designation to the node's hostname? Thanks, Mark _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.bergman at uphs.upenn.edu Fri Oct 31 18:57:44 2014 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Fri, 31 Oct 2014 14:57:44 -0400 Subject: [gpfsug-discuss] mapping to hostname? In-Reply-To: Your message of "Fri, 31 Oct 2014 11:20:40 -0700." References: <25152-1414775455.156309@Pc2q.WYui.XCNm> Message-ID: <9586-1414781864.388104@tEdB.dMla.tGDi> In the message dated: Fri, 31 Oct 2014 11:20:40 -0700, The pithy ruminations from Sven Oehme on to hostname?> were: => Hi, => => the official way to do this is mmdiag --network OK. I'm now using: mmdiag --network | awk '{if ( $1 ~ / => thx. Sven => => => ------------------------------------------ => Sven Oehme => Scalable Storage Research => email: oehmes at us.ibm.com => Phone: +1 (408) 824-8904 => IBM Almaden Research Lab => ------------------------------------------ => => => => From: mark.bergman at uphs.upenn.edu => To: gpfsug main discussion list => Date: 10/31/2014 10:11 AM => Subject: [gpfsug-discuss] mapping to hostname? => Sent by: gpfsug-discuss-bounces at gpfsug.org => => => => Many GPFS logs & utilities refer to nodes via their name. => => I haven't found an "mm*" executable that shows the mapping between that => name an the hostname. => => Is there a simple method to map the designation to the node's => hostname? => => Thanks, => => Mark => From stuartb at 4gh.net Fri Oct 3 18:19:08 2014 From: stuartb at 4gh.net (Stuart Barkley) Date: Fri, 3 Oct 2014 13:19:08 -0400 (EDT) Subject: [gpfsug-discuss] filesets and mountpoint naming Message-ID: Resent: First copy sent Sept 23. Maybe stuck in a moderation queue? When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From bbanister at jumptrading.com Mon Oct 6 16:17:44 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Mon, 6 Oct 2014 15:17:44 +0000 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com> There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang. Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment. Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley Sent: Friday, October 03, 2014 12:19 PM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] filesets and mountpoint naming Resent: First copy sent Sept 23. Maybe stuck in a moderation queue? When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From bbanister at jumptrading.com Mon Oct 6 16:36:17 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Mon, 6 Oct 2014 15:36:17 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch -j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the "--home-inode-file" from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sandra.McLaughlin at astrazeneca.com Mon Oct 6 16:40:45 2014 From: Sandra.McLaughlin at astrazeneca.com (McLaughlin, Sandra M) Date: Mon, 6 Oct 2014 15:40:45 +0000 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <5ed81d7bfbc94873aa804cfc807d5858@DBXPR04MB031.eurprd04.prod.outlook.com> Hi Stuart, We have a very similar setup. I use /gpfs01, /gpfs02 etc. and then use filesets within those, and symbolic links on the gpfs cluster members to give the same user experience combined with automounter maps (we have a large number of NFS clients as well as cluster members). This all works quite well. Regards, Sandra -------------------------------------------------------------------------- AstraZeneca UK Limited is a company incorporated in England and Wales with registered number: 03674842 and a registered office at 2 Kingdom Street, London, W2 6BD. Confidentiality Notice: This message is private and may contain confidential, proprietary and legally privileged information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorised use or disclosure of the contents of this message is not permitted and may be unlawful. Disclaimer: Email messages may be subject to delays, interception, non-delivery and unauthorised alterations. Therefore, information expressed in this message is not given or endorsed by AstraZeneca UK Limited unless otherwise notified by an authorised representative independent of this message. No contractual relationship is created by this message by any person unless specifically indicated by agreement in writing other than email. Monitoring: AstraZeneca UK Limited may monitor email traffic data and content for the purposes of the prevention and detection of crime, ensuring the security of our computer systems and checking Compliance with our Code of Conduct and Policies. -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley Sent: 23 September 2014 16:47 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] filesets and mountpoint naming When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From zgiles at gmail.com Mon Oct 6 16:42:56 2014 From: zgiles at gmail.com (Zachary Giles) Date: Mon, 6 Oct 2014 11:42:56 -0400 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCB97@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Here we have just one large GPFS file system with many file sets inside. We mount it under /sc/something (sc for scientific computing). We user the /sc/ as we previously had another GPFS file system while migrating from one to the other. It's pretty easy and straight forward to have just one file system.. eases administration and mounting. You can make symlinks.. like /scratch -> /sc/something/scratch/ if you want. We did that, and it's how most of our users got to the system for a long time. We even remounted the GPFS file system from where DDN left it at install time ( /gs01 ) to /sc/gs01, updated the symlink, and the users never knew. Multicluster for compute nodes separate from the FS cluster. YMMV depending on if you want to allow everyone to mount your file system or not. I know some people don't. We only admin our own boxes and no one else does, so it works best this way for us given the ideal scenario. On Mon, Oct 6, 2014 at 11:17 AM, Bryan Banister wrote: > There is a general system administration idiom that states you should avoid mounting file systems at the root directory (e.g. /) to avoid any problems with response to administrative commands in the root directory (e.g. ls, stat, etc) if there is a file system issue that would cause these commands to hang. > > Beyond that the directory and file system naming scheme is really dependent on how your organization wants to manage the environment. Hope that helps, > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley > Sent: Friday, October 03, 2014 12:19 PM > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] filesets and mountpoint naming > > Resent: First copy sent Sept 23. Maybe stuck in a moderation queue? > > When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: > > /home > /scratch > /projects > /reference > /applications > > We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). > > We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. > > We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. > > Some thoughts are to just do filesystems as: > > /gpfs01, /gpfs02, etc. > /mnt/gpfs01, etc > /mnt/clustera/gpfs01, etc. > > What have other people done? Are you happy with it? What would you do differently? > > Thanks, > Stuart > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Zach Giles zgiles at gmail.com From oehmes at gmail.com Mon Oct 6 17:27:58 2014 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 6 Oct 2014 09:27:58 -0700 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: > Just an FYI to the GPFS user community, > > > > We have been testing out GPFS AFM file systems in our required process of > file data migration between two GPFS file systems. The two GPFS file > systems are managed in two separate GPFS clusters. We have a third GPFS > cluster for compute systems. We created new independent AFM filesets in > the new GPFS file system that are linked to directories in the old file > system. Unfortunately access to the AFM filesets from the compute cluster > completely hang. Access to the other parts of the second file system is > fine. This limitation/issue is not documented in the Advanced Admin Guide. > > > > Further, we performed prefetch operations using a file mmafmctl command, > but the process appears to be single threaded and the operation was > extremely slow as a result. According to the Advanced Admin Guide, it is > not possible to run multiple prefetch jobs on the same fileset: > > GPFS can prefetch the data using the *mmafmctl **Device **prefetch ?j **FilesetName > *command (which specifies > > a list of files to prefetch). Note the following about prefetching: > > v It can be run in parallel on multiple filesets (although more than one > prefetching job cannot be run in > > parallel on a single fileset). > > > > We were able to quickly create the ?--home-inode-file? from the old file > system using the mmapplypolicy command as the documentation describes. > However the AFM prefetch operation is so slow that we are better off > running parallel rsync operations between the file systems versus using the > GPFS AFM prefetch operation. > > > > Cheers, > > -Bryan > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Mon Oct 6 17:30:02 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Mon, 6 Oct 2014 16:30:02 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister > wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kgunda at in.ibm.com Tue Oct 7 06:03:07 2014 From: kgunda at in.ibm.com (Kalyan Gunda) Date: Tue, 7 Oct 2014 10:33:07 +0530 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Tue Oct 7 15:44:48 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 7 Oct 2014 14:44:48 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> Interesting that AFM is supposed to work in a multi-cluster environment. We were using GPFS on the backend. The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount. The "gateway" nodes in the new cluster mounted the old file system. All systems were connected over the same QDR IB fabric. The client compute nodes in the third cluster mounted both the old and new file systems. I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed. I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable. There were no error messages in any GPFS or system logs related to the `ls` fail. NFS clients accessing cNFS servers in the new cluster also worked as expected. The `ls` from the NFS client in an AFM fileset returned the expected directory listing. Thus all symptoms indicated the configuration wasn't supported. I may try to replicate the problem in a test environment at some point. However AFM isn't really a great solution for file data migration between file systems for these reasons: 1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data). No way to have it simply keep the two namespaces in sync. And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system. This is concerning and it is not easy to detect that this event has occurred. 2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME! And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES! 3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration. Multiple threads from a single node just doesn't cut it due to single node bandwidth limits. When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second. The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list! 4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration. Yes, there is of course efforts to breakup the namespace for each rsync operations. This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired. AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN. Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 12:03 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From kgunda at in.ibm.com Tue Oct 7 16:20:30 2014 From: kgunda at in.ibm.com (Kalyan Gunda) Date: Tue, 7 Oct 2014 20:50:30 +0530 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: some clarifications inline: Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/07/2014 08:12 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org Interesting that AFM is supposed to work in a multi-cluster environment. We were using GPFS on the backend. The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount. The "gateway" nodes in the new cluster mounted the old file system. All systems were connected over the same QDR IB fabric. The client compute nodes in the third cluster mounted both the old and new file systems. I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed. I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable. There were no error messages in any GPFS or system logs related to the `ls` fail. NFS clients accessing cNFS servers in the new cluster also worked as expected. The `ls` from the NFS client in an AFM fileset returned the expected directory listing. Thus all symptoms indicated the configuration wasn't supported. I may try to replicate the problem in a test environment at some point. However AFM isn't really a great solution for file data migration between file systems for these reasons: 1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data). No way to have it simply keep the two namespaces in sync. And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system. This is concerning and it is not easy to detect that this event has occurred. --> The LU mode is meant for scenarios where changes in cache are not meant to be pushed back to old filesystem. If thats not whats desired then other AFM modes like IW can be used to keep namespace in sync and data can flow from both sides. Typically, for data migration --metadata-only to pull in the full namespace first and data can be migrated on demand or via policy as outlined above using prefetch cmd. AFM setup should be extension to GPFS multi-cluster setup when using GPFS backend. 2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME! And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES! --> I am not sure I follow the first downtime. If applications have to start using the new filesystem, then they have to be informed accordingly. If this can be done without bringing down applications, then there is no DOWNTIME. Regarding, second downtime, you are right, disabling AFM after data migration requires unlink and hence downtime. But there is a easy workaround, where revalidation intervals can be increased to max or GW nodes can be unconfigured without downtime with same effect. And disabling AFM can be done at a later point during maintenance window. We plan to modify this to have this done online aka without requiring unlink of the fileset. This will get prioritized if there is enough interest in AFM being used in this direction. 3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration. Multiple threads from a single node just doesn't cut it due to single node bandwidth limits. When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second. The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list! --> Prefetch can run on multiple nodes by configuring multiple GW nodes and enabling parallel i/o as specified in the docs..link provided below. Infact it can parallelize data xfer to a single file and also do multiple files in parallel depending on filesizes and various tuning params. 4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration. Yes, there is of course efforts to breakup the namespace for each rsync operations. This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired. --> AFM can be used for data migration without any downtime dictated by AFM (see above) and it can infact use multiple threads on multiple nodes to do parallel i/o. AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN. Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 12:03 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister To: gpfsug main discussion list Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sdinardo at ebi.ac.uk Thu Oct 9 13:02:44 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Thu, 09 Oct 2014 13:02:44 +0100 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? Message-ID: <54367964.1050900@ebi.ac.uk> Hello everyone, Suppose we want to build a new GPFS storage using SAN attached storages, but instead to put metadata in a shared storage, we want to use FusionIO PCI cards locally on the servers to speed up metadata operation( http://www.fusionio.com/products/iodrive) and for reliability, replicate the metadata in all the servers, will this work in case of server failure? To make it more clear: If a server fail i will loose also a metadata vdisk. Its the replica mechanism its reliable enough to avoid metadata corruption and loss of data? Thanks in advance Salvatore Di Nardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Oct 9 20:31:28 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 9 Oct 2014 19:31:28 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> Just wanted to pass my GPFS RFE along: http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 Description: GPFS File System Manager should provide the option to log all file and directory operations that occur in a file system, preferably stored in a TSD (Time Series Database) that could be quickly queried through an API interface and command line tools. This would allow many required file system management operations to obtain the change log of a file system namespace without having to use the GPFS ILM policy engine to search all file system metadata for changes, and would not need to run massive differential comparisons of file system namespace snapshots to determine what files have been modified, deleted, added, etc. It would be doubly great if this could be controlled on a per-fileset bases. Use case: This could be used for a very large number of file system management applications, including: 1) SOBAR (Scale-Out Backup And Restore) 2) Data Security Auditing and Monitoring applications 3) Async Replication of namespace between GPFS file systems without the requirement of AFM, which must use ILM policies that add unnecessary workload to metadata resources. 4) Application file system access profiling Please vote for it if you feel it would also benefit your operation, thanks, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From service at metamodul.com Fri Oct 10 13:21:43 2014 From: service at metamodul.com (service at metamodul.com) Date: Fri, 10 Oct 2014 14:21:43 +0200 (CEST) Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <937639307.291563.1412943703119.JavaMail.open-xchange@oxbaltgw12.schlund.de> > Bryan Banister hat am 9. Oktober 2014 um 21:31 > geschrieben: > > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 > > I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgp at psu.edu Fri Oct 10 16:04:02 2014 From: pgp at psu.edu (Phil Pishioneri) Date: Fri, 10 Oct 2014 11:04:02 -0400 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <5437F562.1080609@psu.edu> On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil From bbanister at jumptrading.com Fri Oct 10 16:08:04 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Oct 2014 15:08:04 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <5437F562.1080609@psu.edu> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From bdeluca at gmail.com Fri Oct 10 16:26:40 2014 From: bdeluca at gmail.com (Ben De Luca) Date: Fri, 10 Oct 2014 23:26:40 +0800 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister wrote: > Hmm... I didn't think to use the DMAPI interface. That could be a nice > option. Has anybody done this already and are there any examples we could > look at? > > Thanks! > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto: > gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri > Sent: Friday, October 10, 2014 10:04 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS RFE promotion > > On 10/9/14 3:31 PM, Bryan Banister wrote: > > > > Just wanted to pass my GPFS RFE along: > > > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > > 0458 > > > > > > *Description*: > > > > GPFS File System Manager should provide the option to log all file and > > directory operations that occur in a file system, preferably stored in > > a TSD (Time Series Database) that could be quickly queried through an > > API interface and command line tools. ... > > > > The rudimentaries for this already exist via the DMAPI interface in GPFS > (used by the TSM HSM product). A while ago this was posted to the IBM GPFS > DeveloperWorks forum: > > On 1/3/11 10:27 AM, dWForums wrote: > > Author: > > AlokK.Dhir > > > > Message: > > We have a proof of concept which uses DMAPI to listens to and passively > logs filesystem changes with a non blocking listener. This log can be used > to generate backup sets etc. Unfortunately, a bug in the current DMAPI > keeps this approach from working in the case of certain events. I am told > 3.4.0.3 may contain a fix. We will gladly share the code once it is > working. > > -Phil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Fri Oct 10 16:51:51 2014 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 10 Oct 2014 08:51:51 -0700 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca wrote: > Id like this to see hot files > > On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister < > bbanister at jumptrading.com> wrote: > >> Hmm... I didn't think to use the DMAPI interface. That could be a nice >> option. Has anybody done this already and are there any examples we could >> look at? >> >> Thanks! >> -Bryan >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org [mailto: >> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri >> Sent: Friday, October 10, 2014 10:04 AM >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] GPFS RFE promotion >> >> On 10/9/14 3:31 PM, Bryan Banister wrote: >> > >> > Just wanted to pass my GPFS RFE along: >> > >> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 >> > 0458 >> > >> > >> > *Description*: >> > >> > GPFS File System Manager should provide the option to log all file and >> > directory operations that occur in a file system, preferably stored in >> > a TSD (Time Series Database) that could be quickly queried through an >> > API interface and command line tools. ... >> > >> >> The rudimentaries for this already exist via the DMAPI interface in GPFS >> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS >> DeveloperWorks forum: >> >> On 1/3/11 10:27 AM, dWForums wrote: >> > Author: >> > AlokK.Dhir >> > >> > Message: >> > We have a proof of concept which uses DMAPI to listens to and passively >> logs filesystem changes with a non blocking listener. This log can be used >> to generate backup sets etc. Unfortunately, a bug in the current DMAPI >> keeps this approach from working in the case of certain events. I am told >> 3.4.0.3 may contain a fix. We will gladly share the code once it is >> working. >> >> -Phil >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) >> only and may contain proprietary, confidential or privileged information. >> If you are not the intended recipient, you are hereby notified that any >> review, dissemination or copying of this email is strictly prohibited, and >> to please notify the sender immediately and destroy this email and any >> attachments. Email transmission cannot be guaranteed to be secure or >> error-free. The Company, therefore, does not make any guarantees as to the >> completeness or accuracy of this email or any attachments. This email is >> for informational purposes only and does not constitute a recommendation, >> offer, request or solicitation of any kind to buy, sell, subscribe, redeem >> or perform any type of transaction of a financial product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Fri Oct 10 17:02:09 2014 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 10 Oct 2014 16:02:09 +0000 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? In-Reply-To: <54367964.1050900@ebi.ac.uk> References: <54367964.1050900@ebi.ac.uk> Message-ID: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> Hi Salvatore, We've done this before (non-shared metadata NSDs with GPFS 4.1) and noted these constraints: * Filesystem descriptor quorum: since it will be easier to have a metadata disk go offline, it's even more important to have three failure groups with FusionIO metadata NSDs in two, and at least a desc_only NSD in the third one. You may even want to explore having three full metadata replicas on FusionIO. (Or perhaps if your workload can tolerate it the third one can be slower but in another GPFS "subnet" so that it isn't used for reads.) * Make sure to set the correct default metadata replicas in your filesystem, corresponding to the number of metadata failure groups you set up. When a metadata server goes offline, it will take the metadata disks with it, and you want a replica of the metadata to be available. * When a metadata server goes offline and comes back up (after a maintenance reboot, for example), the non-shared metadata disks will be stopped. Until those are brought back into a well-known replicated state, you are at risk of a cluster-wide filesystem unmount if there is a subsequent metadata disk failure. But GPFS will continue to work, by default, allowing reads and writes against the remaining metadata replica. You must detect that disks are stopped (e.g. mmlsdisk) and restart them (e.g. with mmchdisk start ?a). I haven't seen anyone "recommend" running non-shared disk like this, and I wouldn't do this for things which can't afford to go offline unexpectedly and require a little more operational attention. But it does appear to work. Thx Paul Sanchez From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Salvatore Di Nardo Sent: Thursday, October 09, 2014 8:03 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? Hello everyone, Suppose we want to build a new GPFS storage using SAN attached storages, but instead to put metadata in a shared storage, we want to use FusionIO PCI cards locally on the servers to speed up metadata operation( http://www.fusionio.com/products/iodrive) and for reliability, replicate the metadata in all the servers, will this work in case of server failure? To make it more clear: If a server fail i will loose also a metadata vdisk. Its the replica mechanism its reliable enough to avoid metadata corruption and loss of data? Thanks in advance Salvatore Di Nardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Fri Oct 10 17:05:03 2014 From: oester at gmail.com (Bob Oesterlin) Date: Fri, 10 Oct 2014 11:05:03 -0500 Subject: [gpfsug-discuss] GPFS File Heat Message-ID: As Sven suggests, this is easy to gather once you turn on file heat. I run this heat.pol file against a file systems to gather the values: -- heat.pol -- define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) END]) rule fh1 external list 'fh' exec '' rule fh2 list 'fh' weight(FILE_HEAT) show( DISPLAY_NULL(FILE_HEAT) || '|' || varchar(file_size) ) -- heat.pol -- Produces output similar to this: /gpfs/.../specFile.pyc 535089836 5892 /gpfs/.../syspath.py 528685287 806 /gpfs/---/bwe.py 528160670 4607 Actual GPFS file path redacted :) After that it's a relatively straightforward process to go thru the values. There is no documentation on what the values really mean, but it does give you some overall indication of which files are getting the most hits. I have other information to share; drop me a note at my work email: robert.oesterlin at nuance.com Bob Oesterlin Sr Storage Engineer, Nuance Communications -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdeluca at gmail.com Fri Oct 10 17:09:49 2014 From: bdeluca at gmail.com (Ben De Luca) Date: Sat, 11 Oct 2014 00:09:49 +0800 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme wrote: > Ben, > > to get lists of 'Hot Files' turn File Heat on , some discussion about it > is here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 > > thx. Sven > > > On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca wrote: > >> Id like this to see hot files >> >> On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister < >> bbanister at jumptrading.com> wrote: >> >>> Hmm... I didn't think to use the DMAPI interface. That could be a nice >>> option. Has anybody done this already and are there any examples we could >>> look at? >>> >>> Thanks! >>> -Bryan >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at gpfsug.org [mailto: >>> gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri >>> Sent: Friday, October 10, 2014 10:04 AM >>> To: gpfsug main discussion list >>> Subject: Re: [gpfsug-discuss] GPFS RFE promotion >>> >>> On 10/9/14 3:31 PM, Bryan Banister wrote: >>> > >>> > Just wanted to pass my GPFS RFE along: >>> > >>> > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 >>> > 0458 >>> > >>> > >>> > *Description*: >>> > >>> > GPFS File System Manager should provide the option to log all file and >>> > directory operations that occur in a file system, preferably stored in >>> > a TSD (Time Series Database) that could be quickly queried through an >>> > API interface and command line tools. ... >>> > >>> >>> The rudimentaries for this already exist via the DMAPI interface in GPFS >>> (used by the TSM HSM product). A while ago this was posted to the IBM GPFS >>> DeveloperWorks forum: >>> >>> On 1/3/11 10:27 AM, dWForums wrote: >>> > Author: >>> > AlokK.Dhir >>> > >>> > Message: >>> > We have a proof of concept which uses DMAPI to listens to and >>> passively logs filesystem changes with a non blocking listener. This log >>> can be used to generate backup sets etc. Unfortunately, a bug in the >>> current DMAPI keeps this approach from working in the case of certain >>> events. I am told 3.4.0.3 may contain a fix. We will gladly share the >>> code once it is working. >>> >>> -Phil >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) >>> only and may contain proprietary, confidential or privileged information. >>> If you are not the intended recipient, you are hereby notified that any >>> review, dissemination or copying of this email is strictly prohibited, and >>> to please notify the sender immediately and destroy this email and any >>> attachments. Email transmission cannot be guaranteed to be secure or >>> error-free. The Company, therefore, does not make any guarantees as to the >>> completeness or accuracy of this email or any attachments. This email is >>> for informational purposes only and does not constitute a recommendation, >>> offer, request or solicitation of any kind to buy, sell, subscribe, redeem >>> or perform any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at gpfsug.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 10 17:15:22 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Oct 2014 16:15:22 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> I agree with Ben, I think. I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Fri Oct 10 17:24:32 2014 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Fri, 10 Oct 2014 16:24:32 +0000 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <201D6001C896B846A9CFC2E841986AC1451878D2@mailnycmb2a.winmail.deshaw.com> We've been mounting all filesystems in a canonical location and bind mounting filesets into the namespace. One gotcha that we recently encountered though was the selection of /gpfs as the root of the canonical mount path. (By default automountdir is set to /gpfs/automountdir, which made this seem like a good spot.) This seems to be where gpfs expects filesystems to be mounted, since there are some hardcoded references in the gpfs.base RPM %pre script (RHEL package for GPFS) which try to nudge processes off of the filesystems before yanking the mounts during an RPM version upgrade. This however may take an exceedingly long time, since it's doing an 'lsof +D /gpfs' which walks the filesystems. -Paul Sanchez -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Stuart Barkley Sent: Tuesday, September 23, 2014 11:47 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] filesets and mountpoint naming When we first started using GPFS we created several filesystems and just directly mounted them where seemed appropriate. We have something like: /home /scratch /projects /reference /applications We are finding the overhead of separate filesystems to be troublesome and are looking at using filesets inside fewer filesystems to accomplish our goals (we will probably keep /home separate for now). We can put symbolic links in place to provide the same user experience, but I'm looking for suggestions as to where to mount the actual gpfs filesystems. We have multiple compute clusters with multiple gpfs systems, one cluster has a traditional gpfs system and a separate gss system which will obviously need multiple mount points. We also want to consider possible future cross cluster mounts. Some thoughts are to just do filesystems as: /gpfs01, /gpfs02, etc. /mnt/gpfs01, etc /mnt/clustera/gpfs01, etc. What have other people done? Are you happy with it? What would you do differently? Thanks, Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Fri Oct 10 17:52:27 2014 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 10 Oct 2014 09:52:27 -0700 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister wrote: > I agree with Ben, I think. > > > > I don?t want to use the ILM policy engine as that puts a direct workload > against the metadata storage and server resources. We need something > out-of-band, out of the file system operational path. > > > > Is there a simple DMAPI daemon that would log the file system namespace > changes that we could use? > > > > If so are there any limitations? > > > > And is it possible to set this up in an HA environment? > > > > Thanks! > > -Bryan > > > > *From:* gpfsug-discuss-bounces at gpfsug.org [mailto: > gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Ben De Luca > *Sent:* Friday, October 10, 2014 11:10 AM > > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > > > querying this through the policy engine is far to late to do any thing > useful with it > > > > On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme wrote: > > Ben, > > > > to get lists of 'Hot Files' turn File Heat on , some discussion about it > is here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 > > > > thx. Sven > > > > > > On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca wrote: > > Id like this to see hot files > > > > On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister < > bbanister at jumptrading.com> wrote: > > Hmm... I didn't think to use the DMAPI interface. That could be a nice > option. Has anybody done this already and are there any examples we could > look at? > > Thanks! > -Bryan > > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto: > gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri > Sent: Friday, October 10, 2014 10:04 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS RFE promotion > > On 10/9/14 3:31 PM, Bryan Banister wrote: > > > > Just wanted to pass my GPFS RFE along: > > > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > > 0458 > > > > > > *Description*: > > > > GPFS File System Manager should provide the option to log all file and > > directory operations that occur in a file system, preferably stored in > > a TSD (Time Series Database) that could be quickly queried through an > > API interface and command line tools. ... > > > > The rudimentaries for this already exist via the DMAPI interface in GPFS > (used by the TSM HSM product). A while ago this was posted to the IBM GPFS > DeveloperWorks forum: > > On 1/3/11 10:27 AM, dWForums wrote: > > Author: > > AlokK.Dhir > > > > Message: > > We have a proof of concept which uses DMAPI to listens to and passively > logs filesystem changes with a non blocking listener. This log can be used > to generate backup sets etc. Unfortunately, a bug in the current DMAPI > keeps this approach from working in the case of certain events. I am told > 3.4.0.3 may contain a fix. We will gladly share the code once it is > working. > > -Phil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 10 18:13:16 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Oct 2014 17:13:16 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted. I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable. I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!) Thanks, -Bryan Ps. Hajo said that he couldn?t access the RFE to vote on it: I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo Here is what the RFE website states: Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 A unique URL that you can bookmark and share with others. From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Friday, October 10, 2014 11:52 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > wrote: I agree with Ben, I think. I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Sat Oct 11 10:37:10 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Sat, 11 Oct 2014 10:37:10 +0100 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> References: <54367964.1050900@ebi.ac.uk> <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> Message-ID: <5438FA46.7090902@ebi.ac.uk> Thanks for your answer. Yes, the idea is to have 3 servers in 3 different failure groups. Each of them with a drive and set 3 metadata replica as the default one. I have not considered that the vdisks could be off after a 'reboot' or failure, so that's a good point, but anyway , after a failure or even a standard reboot, the server and the cluster have to be checked anyway, and i always check the vdisk status, so no big deal. Your answer made me consider also another thing... Once put them back online, they will be restriped automatically or should i run every time 'mmrestripefs' to verify/correct the replicas? I understand that use lodal disk sound strange, infact our first idea was just to add some ssd to the shared storage, but then we considered that the sas cable could be a huge bottleneck. The cost difference is not huge and the fusioio locally on the server would make the metadata just fly. On 10/10/14 17:02, Sanchez, Paul wrote: > > Hi Salvatore, > > We've done this before (non-shared metadata NSDs with GPFS 4.1) and > noted these constraints: > > * Filesystem descriptor quorum: since it will be easier to have a > metadata disk go offline, it's even more important to have three > failure groups with FusionIO metadata NSDs in two, and at least a > desc_only NSD in the third one. You may even want to explore having > three full metadata replicas on FusionIO. (Or perhaps if your workload > can tolerate it the third one can be slower but in another GPFS > "subnet" so that it isn't used for reads.) > > * Make sure to set the correct default metadata replicas in your > filesystem, corresponding to the number of metadata failure groups you > set up. When a metadata server goes offline, it will take the metadata > disks with it, and you want a replica of the metadata to be available. > > * When a metadata server goes offline and comes back up (after a > maintenance reboot, for example), the non-shared metadata disks will > be stopped. Until those are brought back into a well-known replicated > state, you are at risk of a cluster-wide filesystem unmount if there > is a subsequent metadata disk failure. But GPFS will continue to work, > by default, allowing reads and writes against the remaining metadata > replica. You must detect that disks are stopped (e.g. mmlsdisk) and > restart them (e.g. with mmchdisk start ?a). > > I haven't seen anyone "recommend" running non-shared disk like this, > and I wouldn't do this for things which can't afford to go offline > unexpectedly and require a little more operational attention. But it > does appear to work. > > Thx > Paul Sanchez > > *From:*gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di > Nardo > *Sent:* Thursday, October 09, 2014 8:03 AM > *To:* gpfsug main discussion list > *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable? > > Hello everyone, > > Suppose we want to build a new GPFS storage using SAN attached > storages, but instead to put metadata in a shared storage, we want to > use FusionIO PCI cards locally on the servers to speed up metadata > operation( http://www.fusionio.com/products/iodrive) and for > reliability, replicate the metadata in all the servers, will this work > in case of server failure? > > To make it more clear: If a server fail i will loose also a metadata > vdisk. Its the replica mechanism its reliable enough to avoid metadata > corruption and loss of data? > > Thanks in advance > Salvatore Di Nardo > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From service at metamodul.com Sun Oct 12 17:03:56 2014 From: service at metamodul.com (MetaService) Date: Sun, 12 Oct 2014 18:03:56 +0200 Subject: [gpfsug-discuss] filesets and mountpoint naming In-Reply-To: References: Message-ID: <1413129836.4846.9.camel@titan> My preferred naming convention is to use the cluster name or part of it as the base directory for all GPFS mounts. Example: Clustername=c1_eum would mean that: /c1_eum/ would be the base directory for all Cluster c1_eum GPFSs In case a second local cluster would exist its root mount point would be /c2_eum/ Even in case of mounting remote clusters a naming collision is not very likely. BTW: For accessing the the final directories /.../scratch ... the user should not rely on the mount points but on given variables provided. CLS_HOME=/... CLS_SCRATCH=/.... hth Hajo From lhorrocks-barlow at ocf.co.uk Fri Oct 10 17:48:24 2014 From: lhorrocks-barlow at ocf.co.uk (Laurence Horrocks- Barlow) Date: Fri, 10 Oct 2014 17:48:24 +0100 Subject: [gpfsug-discuss] metadata vdisks on fusionio.. doable? In-Reply-To: <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> References: <54367964.1050900@ebi.ac.uk> <201D6001C896B846A9CFC2E841986AC145187803@mailnycmb2a.winmail.deshaw.com> Message-ID: <54380DD8.2020909@ocf.co.uk> Hi Salvatore, Just to add that when the local metadata disk fails or the server goes offline there will most likely be an I/O interruption/pause whist the GPFS cluster renegotiates. The main concept to be aware of (as Paul mentioned) is that when a disk goes offline it will appear down to GPFS, once you've started the disk again it will rediscover and scan the metadata for any missing updates, these updates are then repaired/replicated again. Laurence Horrocks-Barlow Linux Systems Software Engineer OCF plc Tel: +44 (0)114 257 2200 Fax: +44 (0)114 257 0022 Web: www.ocf.co.uk Blog: blog.ocf.co.uk Twitter: @ocfplc OCF plc is a company registered in England and Wales. Registered number 4132533, VAT number GB 780 6803 14. Registered office address: OCF plc, 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 2PG. This message is private and confidential. If you have received this message in error, please notify us and remove it from your system. On 10/10/2014 17:02, Sanchez, Paul wrote: > > Hi Salvatore, > > We've done this before (non-shared metadata NSDs with GPFS 4.1) and > noted these constraints: > > * Filesystem descriptor quorum: since it will be easier to have a > metadata disk go offline, it's even more important to have three > failure groups with FusionIO metadata NSDs in two, and at least a > desc_only NSD in the third one. You may even want to explore having > three full metadata replicas on FusionIO. (Or perhaps if your workload > can tolerate it the third one can be slower but in another GPFS > "subnet" so that it isn't used for reads.) > > * Make sure to set the correct default metadata replicas in your > filesystem, corresponding to the number of metadata failure groups you > set up. When a metadata server goes offline, it will take the metadata > disks with it, and you want a replica of the metadata to be available. > > * When a metadata server goes offline and comes back up (after a > maintenance reboot, for example), the non-shared metadata disks will > be stopped. Until those are brought back into a well-known replicated > state, you are at risk of a cluster-wide filesystem unmount if there > is a subsequent metadata disk failure. But GPFS will continue to work, > by default, allowing reads and writes against the remaining metadata > replica. You must detect that disks are stopped (e.g. mmlsdisk) and > restart them (e.g. with mmchdisk start ?a). > > I haven't seen anyone "recommend" running non-shared disk like this, > and I wouldn't do this for things which can't afford to go offline > unexpectedly and require a little more operational attention. But it > does appear to work. > > Thx > Paul Sanchez > > *From:*gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Salvatore Di > Nardo > *Sent:* Thursday, October 09, 2014 8:03 AM > *To:* gpfsug main discussion list > *Subject:* [gpfsug-discuss] metadata vdisks on fusionio.. doable? > > Hello everyone, > > Suppose we want to build a new GPFS storage using SAN attached > storages, but instead to put metadata in a shared storage, we want to > use FusionIO PCI cards locally on the servers to speed up metadata > operation( http://www.fusionio.com/products/iodrive) and for > reliability, replicate the metadata in all the servers, will this work > in case of server failure? > > To make it more clear: If a server fail i will loose also a metadata > vdisk. Its the replica mechanism its reliable enough to avoid metadata > corruption and loss of data? > > Thanks in advance > Salvatore Di Nardo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: lhorrocks-barlow.vcf Type: text/x-vcard Size: 388 bytes Desc: not available URL: From kraemerf at de.ibm.com Mon Oct 13 12:10:17 2014 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Mon, 13 Oct 2014 13:10:17 +0200 Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany Message-ID: GPFS at LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany Oct 14th 11:15-12:05 Room 18 http://sched.co/1uMYEWK Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany From service at metamodul.com Mon Oct 13 16:49:44 2014 From: service at metamodul.com (service at metamodul.com) Date: Mon, 13 Oct 2014 17:49:44 +0200 (CEST) Subject: [gpfsug-discuss] FYI - GPFS at LinuxCon+CloudOpen Europe 2014, Duesseldorf, Germany In-Reply-To: References: Message-ID: <994787708.574787.1413215384447.JavaMail.open-xchange@oxbaltgw12.schlund.de> Hallo Frank, the announcement is a little bit to late for me. Would be nice if you could share your speech later. cheers Hajo -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Oct 14 15:39:35 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Oct 2014 15:39:35 +0100 Subject: [gpfsug-discuss] wait for permission to append to log Message-ID: <543D35A7.7080800@ebi.ac.uk> hello all, could someone explain me the meaning of those waiters? gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' Does it means that the vdisk logs are struggling? Regards, Salvatore From oehmes at us.ibm.com Tue Oct 14 15:51:10 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 07:51:10 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: <543D35A7.7080800@ebi.ac.uk> References: <543D35A7.7080800@ebi.ac.uk> Message-ID: it means there is contention on inserting data into the fast write log on the GSS Node, which could be config or workload related what GSS code version are you running and how are the nodes connected with each other (Ethernet or IB) ? ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Salvatore Di Nardo To: gpfsug main discussion list Date: 10/14/2014 07:40 AM Subject: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org hello all, could someone explain me the meaning of those waiters? gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' Does it means that the vdisk logs are struggling? Regards, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Oct 14 16:23:01 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Oct 2014 16:23:01 +0100 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> Message-ID: <543D3FD5.1060705@ebi.ac.uk> On 14/10/14 15:51, Sven Oehme wrote: > it means there is contention on inserting data into the fast write log > on the GSS Node, which could be config or workload related > what GSS code version are you running [root at ebi5-251 ~]# mmdiag --version === mmdiag: version === Current GPFS build: "3.5.0-11 efix1 (888041)". Built on Jul 9 2013 at 18:03:32 Running 6 days 2 hours 10 minutes 35 secs > and how are the nodes connected with each other (Ethernet or IB) ? ethernet. they use the same bonding (4x10Gb/s) where the data is passing. We don't have admin dedicated network [root at gss03a ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager *Note:* The 3 node "pairs" (gss01, gss02 and gss03) are in different subnet because of datacenter constraints ( They are not physically in the same row, and due to network constraints was not possible to put them in the same subnet). The packets are routed, but should not be a problem as there is 160Gb/s bandwidth between them. Regards, Salvatore > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug main discussion list > Date: 10/14/2014 07:40 AM > Subject: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > ------------------------------------------------------------------------ > > > > hello all, > could someone explain me the meaning of those waiters? > > gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > > Does it means that the vdisk logs are struggling? > > Regards, > Salvatore > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Tue Oct 14 17:22:41 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 09:22:41 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: <543D3FD5.1060705@ebi.ac.uk> References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> Message-ID: your GSS code version is very backlevel. can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk as well as mmlsconfig and mmlsfs all thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Salvatore Di Nardo To: gpfsug-discuss at gpfsug.org Date: 10/14/2014 08:23 AM Subject: Re: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org On 14/10/14 15:51, Sven Oehme wrote: it means there is contention on inserting data into the fast write log on the GSS Node, which could be config or workload related what GSS code version are you running [root at ebi5-251 ~]# mmdiag --version === mmdiag: version === Current GPFS build: "3.5.0-11 efix1 (888041)". Built on Jul 9 2013 at 18:03:32 Running 6 days 2 hours 10 minutes 35 secs and how are the nodes connected with each other (Ethernet or IB) ? ethernet. they use the same bonding (4x10Gb/s) where the data is passing. We don't have admin dedicated network [root at gss03a ~]# mmlscluster GPFS cluster information ======================== GPFS cluster name: GSS.ebi.ac.uk GPFS cluster id: 17987981184946329605 GPFS UID domain: GSS.ebi.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: gss01a.ebi.ac.uk Secondary server: gss02b.ebi.ac.uk Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------- 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different subnet because of datacenter constraints ( They are not physically in the same row, and due to network constraints was not possible to put them in the same subnet). The packets are routed, but should not be a problem as there is 160Gb/s bandwidth between them. Regards, Salvatore ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Salvatore Di Nardo To: gpfsug main discussion list Date: 10/14/2014 07:40 AM Subject: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org hello all, could someone explain me the meaning of those waiters? gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) (VdiskLogAppendCondvar), reason 'wait for permission to append to log' Does it means that the vdisk logs are struggling? Regards, Salvatore _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdinardo at ebi.ac.uk Tue Oct 14 17:39:18 2014 From: sdinardo at ebi.ac.uk (Salvatore Di Nardo) Date: Tue, 14 Oct 2014 17:39:18 +0100 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> Message-ID: <543D51B6.3070602@ebi.ac.uk> Thanks in advance for your help. We have 6 RG: recovery group vdisks vdisks servers ------------------ ----------- ------ ------- gss01a 4 8 gss01a.ebi.ac.uk,gss01b.ebi.ac.uk gss01b 4 8 gss01b.ebi.ac.uk,gss01a.ebi.ac.uk gss02a 4 8 gss02a.ebi.ac.uk,gss02b.ebi.ac.uk gss02b 4 8 gss02b.ebi.ac.uk,gss02a.ebi.ac.uk gss03a 4 8 gss03a.ebi.ac.uk,gss03b.ebi.ac.uk gss03b 4 8 gss03b.ebi.ac.uk,gss03a.ebi.ac.uk Check the attached file for RG details. Following mmlsconfig: [root at gss01a ~]# mmlsconfig Configuration data for cluster GSS.ebi.ac.uk: --------------------------------------------- myNodeConfigNumber 1 clusterName GSS.ebi.ac.uk clusterId 17987981184946329605 autoload no dmapiFileHandleSize 32 minReleaseLevel 3.5.0.11 [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] pagepool 38g nsdRAIDBufferPoolSizePct 80 maxBufferDescs 2m numaMemoryInterleave yes prefetchPct 5 maxblocksize 16m nsdRAIDTracks 128k ioHistorySize 64k nsdRAIDSmallBufferSize 256k nsdMaxWorkerThreads 3k nsdMinWorkerThreads 3k nsdRAIDSmallThreadRatio 2 nsdRAIDThreadsPerQueue 16 nsdClientCksumTypeLocal ck64 nsdClientCksumTypeRemote ck64 nsdRAIDEventLogToConsole all nsdRAIDFastWriteFSDataLimit 64k nsdRAIDFastWriteFSMetadataLimit 256k nsdRAIDReconstructAggressiveness 1 nsdRAIDFlusherBuffersLowWatermarkPct 20 nsdRAIDFlusherBuffersLimitPct 80 nsdRAIDFlusherTracksLowWatermarkPct 20 nsdRAIDFlusherTracksLimitPct 80 nsdRAIDFlusherFWLogHighWatermarkMB 1000 nsdRAIDFlusherFWLogLimitMB 5000 nsdRAIDFlusherThreadsLowWatermark 1 nsdRAIDFlusherThreadsHighWatermark 512 nsdRAIDBlockDeviceMaxSectorsKB 4096 nsdRAIDBlockDeviceNrRequests 32 nsdRAIDBlockDeviceQueueDepth 16 nsdRAIDBlockDeviceScheduler deadline nsdRAIDMaxTransientStale2FT 1 nsdRAIDMaxTransientStale3FT 1 syncWorkerThreads 256 tscWorkerPool 64 nsdInlineWriteMax 32k maxFilesToCache 12k maxStatCache 512 maxGeneralThreads 1280 flushedDataTarget 1024 flushedInodeTarget 1024 maxFileCleaners 1024 maxBufferCleaners 1024 logBufferCount 20 logWrapAmountPct 2 logWrapThreads 128 maxAllocRegionsPerNode 32 maxBackgroundDeletionThreads 16 maxInodeDeallocPrefetch 128 maxMBpS 16000 maxReceiverThreads 128 worker1Threads 1024 worker3Threads 32 [common] cipherList AUTHONLY socketMaxListenConnections 1500 failureDetectionTime 60 [common] adminMode central File systems in cluster GSS.ebi.ac.uk: -------------------------------------- /dev/gpfs1 For more configuration paramenters i also attached a file with the complete output of mmdiag --config. and mmlsfs: File system attributes for /dev/gpfs1: ====================================== flag value description ------------------- ------------------------ ----------------------------------- -f 32768 Minimum fragment size in bytes (system pool) 262144 Minimum fragment size in bytes (other pools) -i 512 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 1000 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 8388608 Block size (other pools) -Q user;group;fileset Quotas enforced user;group;fileset Default quotas enabled --filesetdf no Fileset df enabled? -V 13.23 (3.5.0.7) File system version --create-time Tue Mar 18 16:01:24 2014 File system creation time -u yes Support for large LUNs? -z no Is DMAPI enabled? -L 4194304 Logfile size -E yes Exact mtime mount option -S yes Suppress atime mount option -K whenpossible Strict replica allocation option --fastea yes Fast external attributes enabled? --inode-limit 134217728 Maximum number of inodes -P system;data Disk storage pools in file system -d gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; -d gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; -d gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; -d gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; -d gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 Disks in file system --perfileset-quota no Per-fileset quota enforcement -A yes Automatic mount option -o none Additional mount options -T /gpfs1 Default mount point --mount-priority 0 Mount priority Regards, Salvatore On 14/10/14 17:22, Sven Oehme wrote: > your GSS code version is very backlevel. > > can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk > as well as mmlsconfig and mmlsfs all > > thx. Sven > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug-discuss at gpfsug.org > Date: 10/14/2014 08:23 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > ------------------------------------------------------------------------ > > > > > On 14/10/14 15:51, Sven Oehme wrote: > it means there is contention on inserting data into the fast write log > on the GSS Node, which could be config or workload related > what GSS code version are you running > [root at ebi5-251 ~]# mmdiag --version > > === mmdiag: version === > Current GPFS build: "3.5.0-11 efix1 (888041)". > Built on Jul 9 2013 at 18:03:32 > Running 6 days 2 hours 10 minutes 35 secs > > > > and how are the nodes connected with each other (Ethernet or IB) ? > ethernet. they use the same bonding (4x10Gb/s) where the data is > passing. We don't have admin dedicated network > > [root at gss03a ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager > > > *Note:* The 3 node "pairs" (gss01, gss02 and gss03) are in different > subnet because of datacenter constraints ( They are not physically in > the same row, and due to network constraints was not possible to put > them in the same subnet). The packets are routed, but should not be a > problem as there is 160Gb/s bandwidth between them. > > Regards, > Salvatore > > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: _oehmes at us.ibm.com_ > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo __ > > To: gpfsug main discussion list __ > > Date: 10/14/2014 07:40 AM > Subject: [gpfsug-discuss] wait for permission to append to log > Sent by: _gpfsug-discuss-bounces at gpfsug.org_ > > ------------------------------------------------------------------------ > > > > hello all, > could someone explain me the meaning of those waiters? > > gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > > Does it means that the vdisk logs are struggling? > > Regards, > Salvatore > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > _http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss01a 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 42% low DA3 no 2 58 2 1 786 GiB 14 days scrub 4% low DA2 no 2 58 2 1 786 GiB 14 days scrub 4% low DA1 no 3 58 2 1 626 GiB 14 days scrub 59% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s01 2 DA3 110 GiB ok e1d1s02 2 DA2 110 GiB ok e1d1s03log 2 LOG 186 GiB ok e1d1s04 2 DA1 108 GiB ok e1d1s05 2 DA2 110 GiB ok e1d1s06 2 DA3 110 GiB ok e1d2s01 2 DA1 108 GiB ok e1d2s02 2 DA2 110 GiB ok e1d2s03 2 DA3 110 GiB ok e1d2s04 2 DA1 108 GiB ok e1d2s05 2 DA2 110 GiB ok e1d2s06 2 DA3 110 GiB ok e1d3s01 2 DA1 108 GiB ok e1d3s02 2 DA2 110 GiB ok e1d3s03 2 DA3 110 GiB ok e1d3s04 2 DA1 108 GiB ok e1d3s05 2 DA2 110 GiB ok e1d3s06 2 DA3 110 GiB ok e1d4s01 2 DA1 108 GiB ok e1d4s02 2 DA2 110 GiB ok e1d4s03 2 DA3 110 GiB ok e1d4s04 2 DA1 108 GiB ok e1d4s05 2 DA2 110 GiB ok e1d4s06 2 DA3 110 GiB ok e1d5s01 2 DA1 108 GiB ok e1d5s02 2 DA2 110 GiB ok e1d5s03 2 DA3 110 GiB ok e1d5s04 2 DA1 108 GiB ok e1d5s05 2 DA2 110 GiB ok e1d5s06 2 DA3 110 GiB ok e2d1s01 2 DA3 110 GiB ok e2d1s02 2 DA2 110 GiB ok e2d1s03log 2 LOG 186 GiB ok e2d1s04 2 DA1 108 GiB ok e2d1s05 2 DA2 110 GiB ok e2d1s06 2 DA3 110 GiB ok e2d2s01 2 DA1 108 GiB ok e2d2s02 2 DA2 110 GiB ok e2d2s03 2 DA3 110 GiB ok e2d2s04 2 DA1 108 GiB ok e2d2s05 2 DA2 110 GiB ok e2d2s06 2 DA3 110 GiB ok e2d3s01 2 DA1 108 GiB ok e2d3s02 2 DA2 110 GiB ok e2d3s03 2 DA3 110 GiB ok e2d3s04 2 DA1 108 GiB ok e2d3s05 2 DA2 110 GiB ok e2d3s06 2 DA3 110 GiB ok e2d4s01 2 DA1 108 GiB ok e2d4s02 2 DA2 110 GiB ok e2d4s03 2 DA3 110 GiB ok e2d4s04 2 DA1 108 GiB ok e2d4s05 2 DA2 110 GiB ok e2d4s06 2 DA3 110 GiB ok e2d5s01 2 DA1 108 GiB ok e2d5s02 2 DA2 110 GiB ok e2d5s03 2 DA3 110 GiB ok e2d5s04 2 DA1 108 GiB ok e2d5s05 2 DA2 110 GiB ok e2d5s06 2 DA3 110 GiB ok e3d1s01 2 DA1 108 GiB ok e3d1s02 2 DA3 110 GiB ok e3d1s03log 2 LOG 186 GiB ok e3d1s04 2 DA1 108 GiB ok e3d1s05 2 DA2 110 GiB ok e3d1s06 2 DA3 110 GiB ok e3d2s01 2 DA1 108 GiB ok e3d2s02 2 DA2 110 GiB ok e3d2s03 2 DA3 110 GiB ok e3d2s04 2 DA1 108 GiB ok e3d2s05 2 DA2 110 GiB ok e3d2s06 2 DA3 110 GiB ok e3d3s01 2 DA1 108 GiB ok e3d3s02 2 DA2 110 GiB ok e3d3s03 2 DA3 110 GiB ok e3d3s04 2 DA1 108 GiB ok e3d3s05 2 DA2 110 GiB ok e3d3s06 2 DA3 110 GiB ok e3d4s01 2 DA1 108 GiB ok e3d4s02 2 DA2 110 GiB ok e3d4s03 2 DA3 110 GiB ok e3d4s04 2 DA1 108 GiB ok e3d4s05 2 DA2 110 GiB ok e3d4s06 2 DA3 110 GiB ok e3d5s01 2 DA1 108 GiB ok e3d5s02 2 DA2 110 GiB ok e3d5s03 2 DA3 110 GiB ok e3d5s04 2 DA1 108 GiB ok e3d5s05 2 DA2 110 GiB ok e3d5s06 2 DA3 110 GiB ok e4d1s01 2 DA1 108 GiB ok e4d1s02 2 DA3 110 GiB ok e4d1s04 2 DA1 108 GiB ok e4d1s05 2 DA2 110 GiB ok e4d1s06 2 DA3 110 GiB ok e4d2s01 2 DA1 108 GiB ok e4d2s02 2 DA2 110 GiB ok e4d2s03 2 DA3 110 GiB ok e4d2s04 2 DA1 106 GiB ok e4d2s05 2 DA2 110 GiB ok e4d2s06 2 DA3 110 GiB ok e4d3s01 2 DA1 106 GiB ok e4d3s02 2 DA2 110 GiB ok e4d3s03 2 DA3 110 GiB ok e4d3s04 2 DA1 106 GiB ok e4d3s05 2 DA2 110 GiB ok e4d3s06 2 DA3 110 GiB ok e4d4s01 2 DA1 106 GiB ok e4d4s02 2 DA2 110 GiB ok e4d4s03 2 DA3 110 GiB ok e4d4s04 2 DA1 106 GiB ok e4d4s05 2 DA2 110 GiB ok e4d4s06 2 DA3 110 GiB ok e4d5s01 2 DA1 106 GiB ok e4d5s02 2 DA2 110 GiB ok e4d5s03 2 DA3 110 GiB ok e4d5s04 2 DA1 106 GiB ok e4d5s05 2 DA2 110 GiB ok e4d5s06 2 DA3 110 GiB ok e5d1s01 2 DA1 106 GiB ok e5d1s02 2 DA2 110 GiB ok e5d1s04 2 DA1 106 GiB ok e5d1s05 2 DA2 110 GiB ok e5d1s06 2 DA3 110 GiB ok e5d2s01 2 DA1 106 GiB ok e5d2s02 2 DA2 110 GiB ok e5d2s03 2 DA3 110 GiB ok e5d2s04 2 DA1 106 GiB ok e5d2s05 2 DA2 110 GiB ok e5d2s06 2 DA3 110 GiB ok e5d3s01 2 DA1 106 GiB ok e5d3s02 2 DA2 110 GiB ok e5d3s03 2 DA3 110 GiB ok e5d3s04 2 DA1 106 GiB ok e5d3s05 2 DA2 110 GiB ok e5d3s06 2 DA3 110 GiB ok e5d4s01 2 DA1 106 GiB ok e5d4s02 2 DA2 110 GiB ok e5d4s03 2 DA3 110 GiB ok e5d4s04 2 DA1 106 GiB ok e5d4s05 2 DA2 110 GiB ok e5d4s06 2 DA3 110 GiB ok e5d5s01 2 DA1 106 GiB ok e5d5s02 2 DA2 110 GiB ok e5d5s03 2 DA3 110 GiB ok e5d5s04 2 DA1 106 GiB ok e5d5s05 2 DA2 110 GiB ok e5d5s06 2 DA3 110 GiB ok e6d1s01 2 DA1 106 GiB ok e6d1s02 2 DA2 110 GiB ok e6d1s04 2 DA1 106 GiB ok e6d1s05 2 DA2 110 GiB ok e6d1s06 2 DA3 110 GiB ok e6d2s01 2 DA1 106 GiB ok e6d2s02 2 DA2 110 GiB ok e6d2s03 2 DA3 110 GiB ok e6d2s04 2 DA1 106 GiB ok e6d2s05 2 DA2 110 GiB ok e6d2s06 2 DA3 110 GiB ok e6d3s01 2 DA1 106 GiB ok e6d3s02 2 DA2 110 GiB ok e6d3s03 2 DA3 110 GiB ok e6d3s04 2 DA1 106 GiB ok e6d3s05 2 DA2 108 GiB ok e6d3s06 2 DA3 108 GiB ok e6d4s01 2 DA1 106 GiB ok e6d4s02 2 DA2 108 GiB ok e6d4s03 2 DA3 108 GiB ok e6d4s04 2 DA1 106 GiB ok e6d4s05 2 DA2 108 GiB ok e6d4s06 2 DA3 108 GiB ok e6d5s01 2 DA1 106 GiB ok e6d5s02 2 DA2 108 GiB ok e6d5s03 2 DA3 108 GiB ok e6d5s04 2 DA1 106 GiB ok e6d5s05 2 DA2 108 GiB ok e6d5s06 2 DA3 108 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss01a_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss01a_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss01a_MetaData_8M_3p_1 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss01a_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss01a_MetaData_8M_3p_3 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss01a_Data_8M_3p_1 8+3p DA3 99 TiB 8 MiB 32 KiB gss01a_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss01a_Data_8M_3p_3 8+3p DA1 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss01a.ebi.ac.uk gss01a.ebi.ac.uk,gss01b.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss01b 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 36% low DA1 no 3 58 2 1 626 GiB 14 days scrub 61% low DA2 no 2 58 2 1 786 GiB 14 days scrub 68% low DA3 no 2 58 2 1 786 GiB 14 days scrub 70% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s07 2 DA1 108 GiB ok e1d1s08 2 DA2 110 GiB ok e1d1s09 2 DA3 110 GiB ok e1d1s10 2 DA1 108 GiB ok e1d1s11 2 DA2 110 GiB ok e1d1s12 2 DA3 110 GiB ok e1d2s07 2 DA1 108 GiB ok e1d2s08 2 DA2 110 GiB ok e1d2s09 2 DA3 110 GiB ok e1d2s10 2 DA1 108 GiB ok e1d2s11 2 DA2 110 GiB ok e1d2s12 2 DA3 110 GiB ok e1d3s07 2 DA1 108 GiB ok e1d3s08 2 DA2 110 GiB ok e1d3s09 2 DA3 110 GiB ok e1d3s10 2 DA1 108 GiB ok e1d3s11 2 DA2 110 GiB ok e1d3s12 2 DA3 110 GiB ok e1d4s07 2 DA1 108 GiB ok e1d4s08 2 DA2 110 GiB ok e1d4s09 2 DA3 110 GiB ok e1d4s10 2 DA1 108 GiB ok e1d4s11 2 DA2 110 GiB ok e1d4s12 2 DA3 110 GiB ok e1d5s07 2 DA1 108 GiB ok e1d5s08 2 DA2 110 GiB ok e1d5s09 2 DA3 110 GiB ok e1d5s10 2 DA3 110 GiB ok e1d5s11 2 DA2 110 GiB ok e1d5s12log 2 LOG 186 GiB ok e2d1s07 2 DA1 106 GiB ok e2d1s08 2 DA2 110 GiB ok e2d1s09 2 DA3 110 GiB ok e2d1s10 2 DA1 108 GiB ok e2d1s11 2 DA2 110 GiB ok e2d1s12 2 DA3 110 GiB ok e2d2s07 2 DA1 108 GiB ok e2d2s08 2 DA2 110 GiB ok e2d2s09 2 DA3 110 GiB ok e2d2s10 2 DA1 108 GiB ok e2d2s11 2 DA2 110 GiB ok e2d2s12 2 DA3 110 GiB ok e2d3s07 2 DA1 108 GiB ok e2d3s08 2 DA2 110 GiB ok e2d3s09 2 DA3 110 GiB ok e2d3s10 2 DA1 108 GiB ok e2d3s11 2 DA2 110 GiB ok e2d3s12 2 DA3 110 GiB ok e2d4s07 2 DA1 108 GiB ok e2d4s08 2 DA2 110 GiB ok e2d4s09 2 DA3 110 GiB ok e2d4s10 2 DA1 108 GiB ok e2d4s11 2 DA2 110 GiB ok e2d4s12 2 DA3 110 GiB ok e2d5s07 2 DA1 108 GiB ok e2d5s08 2 DA2 110 GiB ok e2d5s09 2 DA3 110 GiB ok e2d5s10 2 DA3 110 GiB ok e2d5s11 2 DA2 110 GiB ok e2d5s12log 2 LOG 186 GiB ok e3d1s07 2 DA1 108 GiB ok e3d1s08 2 DA2 110 GiB ok e3d1s09 2 DA3 110 GiB ok e3d1s10 2 DA1 108 GiB ok e3d1s11 2 DA2 110 GiB ok e3d1s12 2 DA3 110 GiB ok e3d2s07 2 DA1 108 GiB ok e3d2s08 2 DA2 110 GiB ok e3d2s09 2 DA3 110 GiB ok e3d2s10 2 DA1 108 GiB ok e3d2s11 2 DA2 110 GiB ok e3d2s12 2 DA3 110 GiB ok e3d3s07 2 DA1 108 GiB ok e3d3s08 2 DA2 110 GiB ok e3d3s09 2 DA3 110 GiB ok e3d3s10 2 DA1 108 GiB ok e3d3s11 2 DA2 110 GiB ok e3d3s12 2 DA3 110 GiB ok e3d4s07 2 DA1 108 GiB ok e3d4s08 2 DA2 110 GiB ok e3d4s09 2 DA3 110 GiB ok e3d4s10 2 DA1 108 GiB ok e3d4s11 2 DA2 110 GiB ok e3d4s12 2 DA3 110 GiB ok e3d5s07 2 DA1 108 GiB ok e3d5s08 2 DA2 110 GiB ok e3d5s09 2 DA3 110 GiB ok e3d5s10 2 DA1 108 GiB ok e3d5s11 2 DA3 110 GiB ok e3d5s12log 2 LOG 186 GiB ok e4d1s07 2 DA1 108 GiB ok e4d1s08 2 DA2 110 GiB ok e4d1s09 2 DA3 110 GiB ok e4d1s10 2 DA1 108 GiB ok e4d1s11 2 DA2 110 GiB ok e4d1s12 2 DA3 110 GiB ok e4d2s07 2 DA1 108 GiB ok e4d2s08 2 DA2 110 GiB ok e4d2s09 2 DA3 110 GiB ok e4d2s10 2 DA1 108 GiB ok e4d2s11 2 DA2 110 GiB ok e4d2s12 2 DA3 110 GiB ok e4d3s07 2 DA1 106 GiB ok e4d3s08 2 DA2 110 GiB ok e4d3s09 2 DA3 110 GiB ok e4d3s10 2 DA1 106 GiB ok e4d3s11 2 DA2 110 GiB ok e4d3s12 2 DA3 110 GiB ok e4d4s07 2 DA1 106 GiB ok e4d4s08 2 DA2 110 GiB ok e4d4s09 2 DA3 110 GiB ok e4d4s10 2 DA1 106 GiB ok e4d4s11 2 DA2 110 GiB ok e4d4s12 2 DA3 110 GiB ok e4d5s07 2 DA1 106 GiB ok e4d5s08 2 DA2 110 GiB ok e4d5s09 2 DA3 110 GiB ok e4d5s10 2 DA1 106 GiB ok e4d5s11 2 DA3 110 GiB ok e5d1s07 2 DA1 106 GiB ok e5d1s08 2 DA2 110 GiB ok e5d1s09 2 DA3 110 GiB ok e5d1s10 2 DA1 106 GiB ok e5d1s11 2 DA2 110 GiB ok e5d1s12 2 DA3 110 GiB ok e5d2s07 2 DA1 106 GiB ok e5d2s08 2 DA2 110 GiB ok e5d2s09 2 DA3 110 GiB ok e5d2s10 2 DA1 106 GiB ok e5d2s11 2 DA2 110 GiB ok e5d2s12 2 DA3 110 GiB ok e5d3s07 2 DA1 106 GiB ok e5d3s08 2 DA2 110 GiB ok e5d3s09 2 DA3 110 GiB ok e5d3s10 2 DA1 106 GiB ok e5d3s11 2 DA2 110 GiB ok e5d3s12 2 DA3 108 GiB ok e5d4s07 2 DA1 106 GiB ok e5d4s08 2 DA2 110 GiB ok e5d4s09 2 DA3 110 GiB ok e5d4s10 2 DA1 106 GiB ok e5d4s11 2 DA2 110 GiB ok e5d4s12 2 DA3 110 GiB ok e5d5s07 2 DA1 106 GiB ok e5d5s08 2 DA2 110 GiB ok e5d5s09 2 DA3 110 GiB ok e5d5s10 2 DA1 106 GiB ok e5d5s11 2 DA2 110 GiB ok e6d1s07 2 DA1 106 GiB ok e6d1s08 2 DA2 110 GiB ok e6d1s09 2 DA3 110 GiB ok e6d1s10 2 DA1 106 GiB ok e6d1s11 2 DA2 110 GiB ok e6d1s12 2 DA3 110 GiB ok e6d2s07 2 DA1 106 GiB ok e6d2s08 2 DA2 110 GiB ok e6d2s09 2 DA3 110 GiB ok e6d2s10 2 DA1 106 GiB ok e6d2s11 2 DA2 110 GiB ok e6d2s12 2 DA3 110 GiB ok e6d3s07 2 DA1 106 GiB ok e6d3s08 2 DA2 108 GiB ok e6d3s09 2 DA3 110 GiB ok e6d3s10 2 DA1 106 GiB ok e6d3s11 2 DA2 108 GiB ok e6d3s12 2 DA3 108 GiB ok e6d4s07 2 DA1 106 GiB ok e6d4s08 2 DA2 108 GiB ok e6d4s09 2 DA3 108 GiB ok e6d4s10 2 DA1 106 GiB ok e6d4s11 2 DA2 108 GiB ok e6d4s12 2 DA3 108 GiB ok e6d5s07 2 DA1 106 GiB ok e6d5s08 2 DA2 110 GiB ok e6d5s09 2 DA3 108 GiB ok e6d5s10 2 DA1 106 GiB ok e6d5s11 2 DA2 108 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss01b_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss01b_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss01b_MetaData_8M_3p_1 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss01b_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss01b_MetaData_8M_3p_3 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss01b_Data_8M_3p_1 8+3p DA1 99 TiB 8 MiB 32 KiB gss01b_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss01b_Data_8M_3p_3 8+3p DA3 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss01b.ebi.ac.uk gss01b.ebi.ac.uk,gss01a.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss02a 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 41% low DA3 no 2 58 2 1 786 GiB 14 days scrub 8% low DA2 no 2 58 2 1 786 GiB 14 days scrub 14% low DA1 no 3 58 2 1 626 GiB 14 days scrub 5% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s01 2 DA3 110 GiB ok e1d1s02 2 DA2 110 GiB ok e1d1s03log 2 LOG 186 GiB ok e1d1s04 2 DA1 108 GiB ok e1d1s05 2 DA2 110 GiB ok e1d1s06 2 DA3 110 GiB ok e1d2s01 2 DA1 108 GiB ok e1d2s02 2 DA2 110 GiB ok e1d2s03 2 DA3 110 GiB ok e1d2s04 2 DA1 108 GiB ok e1d2s05 2 DA2 110 GiB ok e1d2s06 2 DA3 110 GiB ok e1d3s01 2 DA1 108 GiB ok e1d3s02 2 DA2 110 GiB ok e1d3s03 2 DA3 110 GiB ok e1d3s04 2 DA1 108 GiB ok e1d3s05 2 DA2 110 GiB ok e1d3s06 2 DA3 110 GiB ok e1d4s01 2 DA1 108 GiB ok e1d4s02 2 DA2 110 GiB ok e1d4s03 2 DA3 110 GiB ok e1d4s04 2 DA1 108 GiB ok e1d4s05 2 DA2 110 GiB ok e1d4s06 2 DA3 110 GiB ok e1d5s01 2 DA1 108 GiB ok e1d5s02 2 DA2 110 GiB ok e1d5s03 2 DA3 110 GiB ok e1d5s04 2 DA1 108 GiB ok e1d5s05 2 DA2 110 GiB ok e1d5s06 2 DA3 110 GiB ok e2d1s01 2 DA3 110 GiB ok e2d1s02 2 DA2 110 GiB ok e2d1s03log 2 LOG 186 GiB ok e2d1s04 2 DA1 108 GiB ok e2d1s05 2 DA2 110 GiB ok e2d1s06 2 DA3 110 GiB ok e2d2s01 2 DA1 106 GiB ok e2d2s02 2 DA2 110 GiB ok e2d2s03 2 DA3 110 GiB ok e2d2s04 2 DA1 106 GiB ok e2d2s05 2 DA2 110 GiB ok e2d2s06 2 DA3 110 GiB ok e2d3s01 2 DA1 106 GiB ok e2d3s02 2 DA2 110 GiB ok e2d3s03 2 DA3 110 GiB ok e2d3s04 2 DA1 106 GiB ok e2d3s05 2 DA2 110 GiB ok e2d3s06 2 DA3 110 GiB ok e2d4s01 2 DA1 106 GiB ok e2d4s02 2 DA2 110 GiB ok e2d4s03 2 DA3 110 GiB ok e2d4s04 2 DA1 106 GiB ok e2d4s05 2 DA2 110 GiB ok e2d4s06 2 DA3 110 GiB ok e2d5s01 2 DA1 108 GiB ok e2d5s02 2 DA2 110 GiB ok e2d5s03 2 DA3 110 GiB ok e2d5s04 2 DA1 108 GiB ok e2d5s05 2 DA2 110 GiB ok e2d5s06 2 DA3 110 GiB ok e3d1s01 2 DA1 108 GiB ok e3d1s02 2 DA3 110 GiB ok e3d1s03log 2 LOG 186 GiB ok e3d1s04 2 DA1 106 GiB ok e3d1s05 2 DA2 110 GiB ok e3d1s06 2 DA3 110 GiB ok e3d2s01 2 DA1 106 GiB ok e3d2s02 2 DA2 110 GiB ok e3d2s03 2 DA3 110 GiB ok e3d2s04 2 DA1 108 GiB ok e3d2s05 2 DA2 110 GiB ok e3d2s06 2 DA3 110 GiB ok e3d3s01 2 DA1 106 GiB ok e3d3s02 2 DA2 110 GiB ok e3d3s03 2 DA3 110 GiB ok e3d3s04 2 DA1 106 GiB ok e3d3s05 2 DA2 110 GiB ok e3d3s06 2 DA3 110 GiB ok e3d4s01 2 DA1 106 GiB ok e3d4s02 2 DA2 110 GiB ok e3d4s03 2 DA3 110 GiB ok e3d4s04 2 DA1 108 GiB ok e3d4s05 2 DA2 110 GiB ok e3d4s06 2 DA3 110 GiB ok e3d5s01 2 DA1 108 GiB ok e3d5s02 2 DA2 110 GiB ok e3d5s03 2 DA3 110 GiB ok e3d5s04 2 DA1 106 GiB ok e3d5s05 2 DA2 110 GiB ok e3d5s06 2 DA3 110 GiB ok e4d1s01 2 DA1 106 GiB ok e4d1s02 2 DA3 110 GiB ok e4d1s04 2 DA1 106 GiB ok e4d1s05 2 DA2 110 GiB ok e4d1s06 2 DA3 110 GiB ok e4d2s01 2 DA1 106 GiB ok e4d2s02 2 DA2 110 GiB ok e4d2s03 2 DA3 110 GiB ok e4d2s04 2 DA1 106 GiB ok e4d2s05 2 DA2 110 GiB ok e4d2s06 2 DA3 110 GiB ok e4d3s01 2 DA1 108 GiB ok e4d3s02 2 DA2 110 GiB ok e4d3s03 2 DA3 110 GiB ok e4d3s04 2 DA1 108 GiB ok e4d3s05 2 DA2 110 GiB ok e4d3s06 2 DA3 110 GiB ok e4d4s01 2 DA1 106 GiB ok e4d4s02 2 DA2 110 GiB ok e4d4s03 2 DA3 110 GiB ok e4d4s04 2 DA1 106 GiB ok e4d4s05 2 DA2 110 GiB ok e4d4s06 2 DA3 110 GiB ok e4d5s01 2 DA1 106 GiB ok e4d5s02 2 DA2 110 GiB ok e4d5s03 2 DA3 110 GiB ok e4d5s04 2 DA1 106 GiB ok e4d5s05 2 DA2 110 GiB ok e4d5s06 2 DA3 110 GiB ok e5d1s01 2 DA1 108 GiB ok e5d1s02 2 DA2 110 GiB ok e5d1s04 2 DA1 106 GiB ok e5d1s05 2 DA2 110 GiB ok e5d1s06 2 DA3 110 GiB ok e5d2s01 2 DA1 108 GiB ok e5d2s02 2 DA2 110 GiB ok e5d2s03 2 DA3 110 GiB ok e5d2s04 2 DA1 108 GiB ok e5d2s05 2 DA2 110 GiB ok e5d2s06 2 DA3 110 GiB ok e5d3s01 2 DA1 108 GiB ok e5d3s02 2 DA2 110 GiB ok e5d3s03 2 DA3 110 GiB ok e5d3s04 2 DA1 106 GiB ok e5d3s05 2 DA2 110 GiB ok e5d3s06 2 DA3 110 GiB ok e5d4s01 2 DA1 108 GiB ok e5d4s02 2 DA2 110 GiB ok e5d4s03 2 DA3 110 GiB ok e5d4s04 2 DA1 108 GiB ok e5d4s05 2 DA2 110 GiB ok e5d4s06 2 DA3 110 GiB ok e5d5s01 2 DA1 108 GiB ok e5d5s02 2 DA2 110 GiB ok e5d5s03 2 DA3 110 GiB ok e5d5s04 2 DA1 106 GiB ok e5d5s05 2 DA2 110 GiB ok e5d5s06 2 DA3 110 GiB ok e6d1s01 2 DA1 108 GiB ok e6d1s02 2 DA2 110 GiB ok e6d1s04 2 DA1 108 GiB ok e6d1s05 2 DA2 110 GiB ok e6d1s06 2 DA3 110 GiB ok e6d2s01 2 DA1 106 GiB ok e6d2s02 2 DA2 110 GiB ok e6d2s03 2 DA3 110 GiB ok e6d2s04 2 DA1 108 GiB ok e6d2s05 2 DA2 108 GiB ok e6d2s06 2 DA3 110 GiB ok e6d3s01 2 DA1 106 GiB ok e6d3s02 2 DA2 108 GiB ok e6d3s03 2 DA3 110 GiB ok e6d3s04 2 DA1 106 GiB ok e6d3s05 2 DA2 108 GiB ok e6d3s06 2 DA3 108 GiB ok e6d4s01 2 DA1 106 GiB ok e6d4s02 2 DA2 108 GiB ok e6d4s03 2 DA3 108 GiB ok e6d4s04 2 DA1 108 GiB ok e6d4s05 2 DA2 108 GiB ok e6d4s06 2 DA3 108 GiB ok e6d5s01 2 DA1 108 GiB ok e6d5s02 2 DA2 110 GiB ok e6d5s03 2 DA3 108 GiB ok e6d5s04 2 DA1 108 GiB ok e6d5s05 2 DA2 110 GiB ok e6d5s06 2 DA3 108 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss02a_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss02a_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss02a_MetaData_8M_3p_1 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss02a_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss02a_MetaData_8M_3p_3 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss02a_Data_8M_3p_1 8+3p DA3 99 TiB 8 MiB 32 KiB gss02a_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss02a_Data_8M_3p_3 8+3p DA1 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss02a.ebi.ac.uk gss02a.ebi.ac.uk,gss02b.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss02b 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 39% low DA1 no 3 58 2 1 626 GiB 14 days scrub 67% low DA2 no 2 58 2 1 786 GiB 14 days scrub 13% low DA3 no 2 58 2 1 786 GiB 14 days scrub 13% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s07 2 DA1 108 GiB ok e1d1s08 2 DA2 110 GiB ok e1d1s09 2 DA3 110 GiB ok e1d1s10 2 DA1 108 GiB ok e1d1s11 2 DA2 110 GiB ok e1d1s12 2 DA3 110 GiB ok e1d2s07 2 DA1 108 GiB ok e1d2s08 2 DA2 110 GiB ok e1d2s09 2 DA3 110 GiB ok e1d2s10 2 DA1 108 GiB ok e1d2s11 2 DA2 110 GiB ok e1d2s12 2 DA3 110 GiB ok e1d3s07 2 DA1 108 GiB ok e1d3s08 2 DA2 110 GiB ok e1d3s09 2 DA3 110 GiB ok e1d3s10 2 DA1 108 GiB ok e1d3s11 2 DA2 110 GiB ok e1d3s12 2 DA3 110 GiB ok e1d4s07 2 DA1 108 GiB ok e1d4s08 2 DA2 110 GiB ok e1d4s09 2 DA3 110 GiB ok e1d4s10 2 DA1 108 GiB ok e1d4s11 2 DA2 110 GiB ok e1d4s12 2 DA3 110 GiB ok e1d5s07 2 DA1 108 GiB ok e1d5s08 2 DA2 110 GiB ok e1d5s09 2 DA3 110 GiB ok e1d5s10 2 DA3 110 GiB ok e1d5s11 2 DA2 110 GiB ok e1d5s12log 2 LOG 186 GiB ok e2d1s07 2 DA1 108 GiB ok e2d1s08 2 DA2 110 GiB ok e2d1s09 2 DA3 110 GiB ok e2d1s10 2 DA1 108 GiB ok e2d1s11 2 DA2 110 GiB ok e2d1s12 2 DA3 110 GiB ok e2d2s07 2 DA1 108 GiB ok e2d2s08 2 DA2 110 GiB ok e2d2s09 2 DA3 110 GiB ok e2d2s10 2 DA1 108 GiB ok e2d2s11 2 DA2 110 GiB ok e2d2s12 2 DA3 110 GiB ok e2d3s07 2 DA1 108 GiB ok e2d3s08 2 DA2 110 GiB ok e2d3s09 2 DA3 110 GiB ok e2d3s10 2 DA1 108 GiB ok e2d3s11 2 DA2 110 GiB ok e2d3s12 2 DA3 110 GiB ok e2d4s07 2 DA1 108 GiB ok e2d4s08 2 DA2 110 GiB ok e2d4s09 2 DA3 110 GiB ok e2d4s10 2 DA1 108 GiB ok e2d4s11 2 DA2 110 GiB ok e2d4s12 2 DA3 110 GiB ok e2d5s07 2 DA1 108 GiB ok e2d5s08 2 DA2 110 GiB ok e2d5s09 2 DA3 110 GiB ok e2d5s10 2 DA3 110 GiB ok e2d5s11 2 DA2 110 GiB ok e2d5s12log 2 LOG 186 GiB ok e3d1s07 2 DA1 108 GiB ok e3d1s08 2 DA2 110 GiB ok e3d1s09 2 DA3 110 GiB ok e3d1s10 2 DA1 108 GiB ok e3d1s11 2 DA2 110 GiB ok e3d1s12 2 DA3 110 GiB ok e3d2s07 2 DA1 108 GiB ok e3d2s08 2 DA2 110 GiB ok e3d2s09 2 DA3 110 GiB ok e3d2s10 2 DA1 108 GiB ok e3d2s11 2 DA2 110 GiB ok e3d2s12 2 DA3 110 GiB ok e3d3s07 2 DA1 108 GiB ok e3d3s08 2 DA2 110 GiB ok e3d3s09 2 DA3 110 GiB ok e3d3s10 2 DA1 108 GiB ok e3d3s11 2 DA2 110 GiB ok e3d3s12 2 DA3 110 GiB ok e3d4s07 2 DA1 108 GiB ok e3d4s08 2 DA2 110 GiB ok e3d4s09 2 DA3 110 GiB ok e3d4s10 2 DA1 108 GiB ok e3d4s11 2 DA2 110 GiB ok e3d4s12 2 DA3 110 GiB ok e3d5s07 2 DA1 108 GiB ok e3d5s08 2 DA2 110 GiB ok e3d5s09 2 DA3 110 GiB ok e3d5s10 2 DA1 108 GiB ok e3d5s11 2 DA3 110 GiB ok e3d5s12log 2 LOG 186 GiB ok e4d1s07 2 DA1 108 GiB ok e4d1s08 2 DA2 110 GiB ok e4d1s09 2 DA3 110 GiB ok e4d1s10 2 DA1 108 GiB ok e4d1s11 2 DA2 110 GiB ok e4d1s12 2 DA3 110 GiB ok e4d2s07 2 DA1 106 GiB ok e4d2s08 2 DA2 110 GiB ok e4d2s09 2 DA3 110 GiB ok e4d2s10 2 DA1 106 GiB ok e4d2s11 2 DA2 110 GiB ok e4d2s12 2 DA3 110 GiB ok e4d3s07 2 DA1 106 GiB ok e4d3s08 2 DA2 110 GiB ok e4d3s09 2 DA3 110 GiB ok e4d3s10 2 DA1 106 GiB ok e4d3s11 2 DA2 110 GiB ok e4d3s12 2 DA3 110 GiB ok e4d4s07 2 DA1 106 GiB ok e4d4s08 2 DA2 110 GiB ok e4d4s09 2 DA3 110 GiB ok e4d4s10 2 DA1 108 GiB ok e4d4s11 2 DA2 110 GiB ok e4d4s12 2 DA3 110 GiB ok e4d5s07 2 DA1 106 GiB ok e4d5s08 2 DA2 110 GiB ok e4d5s09 2 DA3 110 GiB ok e4d5s10 2 DA1 106 GiB ok e4d5s11 2 DA3 110 GiB ok e5d1s07 2 DA1 106 GiB ok e5d1s08 2 DA2 110 GiB ok e5d1s09 2 DA3 110 GiB ok e5d1s10 2 DA1 106 GiB ok e5d1s11 2 DA2 110 GiB ok e5d1s12 2 DA3 110 GiB ok e5d2s07 2 DA1 106 GiB ok e5d2s08 2 DA2 110 GiB ok e5d2s09 2 DA3 110 GiB ok e5d2s10 2 DA1 106 GiB ok e5d2s11 2 DA2 110 GiB ok e5d2s12 2 DA3 110 GiB ok e5d3s07 2 DA1 106 GiB ok e5d3s08 2 DA2 110 GiB ok e5d3s09 2 DA3 110 GiB ok e5d3s10 2 DA1 106 GiB ok e5d3s11 2 DA2 110 GiB ok e5d3s12 2 DA3 110 GiB ok e5d4s07 2 DA1 106 GiB ok e5d4s08 2 DA2 110 GiB ok e5d4s09 2 DA3 110 GiB ok e5d4s10 2 DA1 106 GiB ok e5d4s11 2 DA2 110 GiB ok e5d4s12 2 DA3 110 GiB ok e5d5s07 2 DA1 106 GiB ok e5d5s08 2 DA2 110 GiB ok e5d5s09 2 DA3 110 GiB ok e5d5s10 2 DA1 106 GiB ok e5d5s11 2 DA2 110 GiB ok e6d1s07 2 DA1 106 GiB ok e6d1s08 2 DA2 110 GiB ok e6d1s09 2 DA3 110 GiB ok e6d1s10 2 DA1 106 GiB ok e6d1s11 2 DA2 110 GiB ok e6d1s12 2 DA3 110 GiB ok e6d2s07 2 DA1 106 GiB ok e6d2s08 2 DA2 110 GiB ok e6d2s09 2 DA3 108 GiB ok e6d2s10 2 DA1 106 GiB ok e6d2s11 2 DA2 108 GiB ok e6d2s12 2 DA3 108 GiB ok e6d3s07 2 DA1 106 GiB ok e6d3s08 2 DA2 108 GiB ok e6d3s09 2 DA3 108 GiB ok e6d3s10 2 DA1 106 GiB ok e6d3s11 2 DA2 108 GiB ok e6d3s12 2 DA3 108 GiB ok e6d4s07 2 DA1 106 GiB ok e6d4s08 2 DA2 108 GiB ok e6d4s09 2 DA3 108 GiB ok e6d4s10 2 DA1 106 GiB ok e6d4s11 2 DA2 108 GiB ok e6d4s12 2 DA3 110 GiB ok e6d5s07 2 DA1 106 GiB ok e6d5s08 2 DA2 110 GiB ok e6d5s09 2 DA3 110 GiB ok e6d5s10 2 DA1 106 GiB ok e6d5s11 2 DA2 110 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss02b_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss02b_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss02b_MetaData_8M_3p_1 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss02b_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss02b_MetaData_8M_3p_3 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss02b_Data_8M_3p_1 8+3p DA1 99 TiB 8 MiB 32 KiB gss02b_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss02b_Data_8M_3p_3 8+3p DA3 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss02b.ebi.ac.uk gss02b.ebi.ac.uk,gss02a.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss03a 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 36% low DA3 no 2 58 2 1 786 GiB 14 days scrub 18% low DA2 no 2 58 2 1 786 GiB 14 days scrub 19% low DA1 no 3 58 2 1 626 GiB 14 days scrub 4% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s01 2 DA3 110 GiB ok e1d1s02 2 DA2 110 GiB ok e1d1s03log 2 LOG 186 GiB ok e1d1s04 2 DA1 108 GiB ok e1d1s05 2 DA2 110 GiB ok e1d1s06 2 DA3 110 GiB ok e1d2s01 2 DA1 108 GiB ok e1d2s02 2 DA2 110 GiB ok e1d2s03 2 DA3 110 GiB ok e1d2s04 2 DA1 108 GiB ok e1d2s05 2 DA2 110 GiB ok e1d2s06 2 DA3 110 GiB ok e1d3s01 2 DA1 108 GiB ok e1d3s02 2 DA2 110 GiB ok e1d3s03 2 DA3 110 GiB ok e1d3s04 2 DA1 108 GiB ok e1d3s05 2 DA2 110 GiB ok e1d3s06 2 DA3 110 GiB ok e1d4s01 2 DA1 108 GiB ok e1d4s02 2 DA2 110 GiB ok e1d4s03 2 DA3 110 GiB ok e1d4s04 2 DA1 108 GiB ok e1d4s05 2 DA2 110 GiB ok e1d4s06 2 DA3 110 GiB ok e1d5s01 2 DA1 108 GiB ok e1d5s02 2 DA2 110 GiB ok e1d5s03 2 DA3 110 GiB ok e1d5s04 2 DA1 108 GiB ok e1d5s05 2 DA2 110 GiB ok e1d5s06 2 DA3 110 GiB ok e2d1s01 2 DA3 110 GiB ok e2d1s02 2 DA2 110 GiB ok e2d1s03log 2 LOG 186 GiB ok e2d1s04 2 DA1 108 GiB ok e2d1s05 2 DA2 110 GiB ok e2d1s06 2 DA3 110 GiB ok e2d2s01 2 DA1 108 GiB ok e2d2s02 2 DA2 110 GiB ok e2d2s03 2 DA3 110 GiB ok e2d2s04 2 DA1 108 GiB ok e2d2s05 2 DA2 110 GiB ok e2d2s06 2 DA3 110 GiB ok e2d3s01 2 DA1 108 GiB ok e2d3s02 2 DA2 110 GiB ok e2d3s03 2 DA3 110 GiB ok e2d3s04 2 DA1 108 GiB ok e2d3s05 2 DA2 110 GiB ok e2d3s06 2 DA3 110 GiB ok e2d4s01 2 DA1 108 GiB ok e2d4s02 2 DA2 110 GiB ok e2d4s03 2 DA3 110 GiB ok e2d4s04 2 DA1 108 GiB ok e2d4s05 2 DA2 110 GiB ok e2d4s06 2 DA3 110 GiB ok e2d5s01 2 DA1 108 GiB ok e2d5s02 2 DA2 110 GiB ok e2d5s03 2 DA3 110 GiB ok e2d5s04 2 DA1 108 GiB ok e2d5s05 2 DA2 110 GiB ok e2d5s06 2 DA3 110 GiB ok e3d1s01 2 DA1 108 GiB ok e3d1s02 2 DA3 110 GiB ok e3d1s03log 2 LOG 186 GiB ok e3d1s04 2 DA1 108 GiB ok e3d1s05 2 DA2 110 GiB ok e3d1s06 2 DA3 110 GiB ok e3d2s01 2 DA1 108 GiB ok e3d2s02 2 DA2 110 GiB ok e3d2s03 2 DA3 110 GiB ok e3d2s04 2 DA1 108 GiB ok e3d2s05 2 DA2 110 GiB ok e3d2s06 2 DA3 110 GiB ok e3d3s01 2 DA1 108 GiB ok e3d3s02 2 DA2 110 GiB ok e3d3s03 2 DA3 110 GiB ok e3d3s04 2 DA1 108 GiB ok e3d3s05 2 DA2 110 GiB ok e3d3s06 2 DA3 110 GiB ok e3d4s01 2 DA1 108 GiB ok e3d4s02 2 DA2 110 GiB ok e3d4s03 2 DA3 110 GiB ok e3d4s04 2 DA1 108 GiB ok e3d4s05 2 DA2 110 GiB ok e3d4s06 2 DA3 110 GiB ok e3d5s01 2 DA1 108 GiB ok e3d5s02 2 DA2 110 GiB ok e3d5s03 2 DA3 110 GiB ok e3d5s04 2 DA1 108 GiB ok e3d5s05 2 DA2 110 GiB ok e3d5s06 2 DA3 110 GiB ok e4d1s01 2 DA1 108 GiB ok e4d1s02 2 DA3 110 GiB ok e4d1s04 2 DA1 108 GiB ok e4d1s05 2 DA2 110 GiB ok e4d1s06 2 DA3 110 GiB ok e4d2s01 2 DA1 108 GiB ok e4d2s02 2 DA2 110 GiB ok e4d2s03 2 DA3 110 GiB ok e4d2s04 2 DA1 106 GiB ok e4d2s05 2 DA2 110 GiB ok e4d2s06 2 DA3 110 GiB ok e4d3s01 2 DA1 106 GiB ok e4d3s02 2 DA2 110 GiB ok e4d3s03 2 DA3 110 GiB ok e4d3s04 2 DA1 106 GiB ok e4d3s05 2 DA2 110 GiB ok e4d3s06 2 DA3 110 GiB ok e4d4s01 2 DA1 106 GiB ok e4d4s02 2 DA2 110 GiB ok e4d4s03 2 DA3 110 GiB ok e4d4s04 2 DA1 106 GiB ok e4d4s05 2 DA2 110 GiB ok e4d4s06 2 DA3 110 GiB ok e4d5s01 2 DA1 106 GiB ok e4d5s02 2 DA2 110 GiB ok e4d5s03 2 DA3 110 GiB ok e4d5s04 2 DA1 106 GiB ok e4d5s05 2 DA2 110 GiB ok e4d5s06 2 DA3 110 GiB ok e5d1s01 2 DA1 106 GiB ok e5d1s02 2 DA2 110 GiB ok e5d1s04 2 DA1 106 GiB ok e5d1s05 2 DA2 110 GiB ok e5d1s06 2 DA3 110 GiB ok e5d2s01 2 DA1 106 GiB ok e5d2s02 2 DA2 110 GiB ok e5d2s03 2 DA3 110 GiB ok e5d2s04 2 DA1 106 GiB ok e5d2s05 2 DA2 110 GiB ok e5d2s06 2 DA3 110 GiB ok e5d3s01 2 DA1 106 GiB ok e5d3s02 2 DA2 110 GiB ok e5d3s03 2 DA3 110 GiB ok e5d3s04 2 DA1 106 GiB ok e5d3s05 2 DA2 110 GiB ok e5d3s06 2 DA3 110 GiB ok e5d4s01 2 DA1 106 GiB ok e5d4s02 2 DA2 110 GiB ok e5d4s03 2 DA3 110 GiB ok e5d4s04 2 DA1 106 GiB ok e5d4s05 2 DA2 110 GiB ok e5d4s06 2 DA3 110 GiB ok e5d5s01 2 DA1 106 GiB ok e5d5s02 2 DA2 110 GiB ok e5d5s03 2 DA3 110 GiB ok e5d5s04 2 DA1 106 GiB ok e5d5s05 2 DA2 110 GiB ok e5d5s06 2 DA3 110 GiB ok e6d1s01 2 DA1 106 GiB ok e6d1s02 2 DA2 110 GiB ok e6d1s04 2 DA1 106 GiB ok e6d1s05 2 DA2 110 GiB ok e6d1s06 2 DA3 110 GiB ok e6d2s01 2 DA1 106 GiB ok e6d2s02 2 DA2 110 GiB ok e6d2s03 2 DA3 110 GiB ok e6d2s04 2 DA1 106 GiB ok e6d2s05 2 DA2 108 GiB ok e6d2s06 2 DA3 108 GiB ok e6d3s01 2 DA1 106 GiB ok e6d3s02 2 DA2 108 GiB ok e6d3s03 2 DA3 108 GiB ok e6d3s04 2 DA1 106 GiB ok e6d3s05 2 DA2 108 GiB ok e6d3s06 2 DA3 108 GiB ok e6d4s01 2 DA1 106 GiB ok e6d4s02 2 DA2 108 GiB ok e6d4s03 2 DA3 108 GiB ok e6d4s04 2 DA1 106 GiB ok e6d4s05 2 DA2 108 GiB ok e6d4s06 2 DA3 108 GiB ok e6d5s01 2 DA1 106 GiB ok e6d5s02 2 DA2 110 GiB ok e6d5s03 2 DA3 110 GiB ok e6d5s04 2 DA1 106 GiB ok e6d5s05 2 DA2 110 GiB ok e6d5s06 2 DA3 110 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss03a_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss03a_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss03a_MetaData_8M_3p_1 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss03a_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss03a_MetaData_8M_3p_3 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss03a_Data_8M_3p_1 8+3p DA3 99 TiB 8 MiB 32 KiB gss03a_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss03a_Data_8M_3p_3 8+3p DA1 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss03a.ebi.ac.uk gss03a.ebi.ac.uk,gss03b.ebi.ac.uk declustered recovery group arrays vdisks pdisks format version ----------------- ----------- ------ ------ -------------- gss03b 4 8 177 3.5.0.5 declustered needs replace scrub background activity array service vdisks pdisks spares threshold free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---------- -------- ------------------------- LOG no 1 3 0 1 558 GiB 14 days scrub 38% low DA1 no 3 58 2 1 626 GiB 14 days scrub 12% low DA2 no 2 58 2 1 786 GiB 14 days scrub 20% low DA3 no 2 58 2 1 786 GiB 14 days scrub 19% low number of declustered pdisk active paths array free space state ----------------- ------------ ----------- ---------- ----- e1d1s07 2 DA1 108 GiB ok e1d1s08 2 DA2 110 GiB ok e1d1s09 2 DA3 110 GiB ok e1d1s10 2 DA1 108 GiB ok e1d1s11 2 DA2 110 GiB ok e1d1s12 2 DA3 110 GiB ok e1d2s07 2 DA1 108 GiB ok e1d2s08 2 DA2 110 GiB ok e1d2s09 2 DA3 110 GiB ok e1d2s10 2 DA1 108 GiB ok e1d2s11 2 DA2 110 GiB ok e1d2s12 2 DA3 110 GiB ok e1d3s07 2 DA1 108 GiB ok e1d3s08 2 DA2 110 GiB ok e1d3s09 2 DA3 110 GiB ok e1d3s10 2 DA1 108 GiB ok e1d3s11 2 DA2 110 GiB ok e1d3s12 2 DA3 110 GiB ok e1d4s07 2 DA1 108 GiB ok e1d4s08 2 DA2 110 GiB ok e1d4s09 2 DA3 110 GiB ok e1d4s10 2 DA1 108 GiB ok e1d4s11 2 DA2 110 GiB ok e1d4s12 2 DA3 110 GiB ok e1d5s07 2 DA1 108 GiB ok e1d5s08 2 DA2 110 GiB ok e1d5s09 2 DA3 110 GiB ok e1d5s10 2 DA3 110 GiB ok e1d5s11 2 DA2 110 GiB ok e1d5s12log 2 LOG 186 GiB ok e2d1s07 2 DA1 108 GiB ok e2d1s08 2 DA2 110 GiB ok e2d1s09 2 DA3 110 GiB ok e2d1s10 2 DA1 106 GiB ok e2d1s11 2 DA2 110 GiB ok e2d1s12 2 DA3 110 GiB ok e2d2s07 2 DA1 106 GiB ok e2d2s08 2 DA2 110 GiB ok e2d2s09 2 DA3 110 GiB ok e2d2s10 2 DA1 106 GiB ok e2d2s11 2 DA2 110 GiB ok e2d2s12 2 DA3 110 GiB ok e2d3s07 2 DA1 106 GiB ok e2d3s08 2 DA2 110 GiB ok e2d3s09 2 DA3 110 GiB ok e2d3s10 2 DA1 106 GiB ok e2d3s11 2 DA2 110 GiB ok e2d3s12 2 DA3 110 GiB ok e2d4s07 2 DA1 106 GiB ok e2d4s08 2 DA2 110 GiB ok e2d4s09 2 DA3 110 GiB ok e2d4s10 2 DA1 108 GiB ok e2d4s11 2 DA2 110 GiB ok e2d4s12 2 DA3 110 GiB ok e2d5s07 2 DA1 108 GiB ok e2d5s08 2 DA2 110 GiB ok e2d5s09 2 DA3 110 GiB ok e2d5s10 2 DA3 110 GiB ok e2d5s11 2 DA2 110 GiB ok e2d5s12log 2 LOG 186 GiB ok e3d1s07 2 DA1 108 GiB ok e3d1s08 2 DA2 110 GiB ok e3d1s09 2 DA3 110 GiB ok e3d1s10 2 DA1 106 GiB ok e3d1s11 2 DA2 110 GiB ok e3d1s12 2 DA3 110 GiB ok e3d2s07 2 DA1 106 GiB ok e3d2s08 2 DA2 110 GiB ok e3d2s09 2 DA3 110 GiB ok e3d2s10 2 DA1 108 GiB ok e3d2s11 2 DA2 110 GiB ok e3d2s12 2 DA3 110 GiB ok e3d3s07 2 DA1 106 GiB ok e3d3s08 2 DA2 110 GiB ok e3d3s09 2 DA3 110 GiB ok e3d3s10 2 DA1 106 GiB ok e3d3s11 2 DA2 110 GiB ok e3d3s12 2 DA3 110 GiB ok e3d4s07 2 DA1 106 GiB ok e3d4s08 2 DA2 110 GiB ok e3d4s09 2 DA3 110 GiB ok e3d4s10 2 DA1 108 GiB ok e3d4s11 2 DA2 110 GiB ok e3d4s12 2 DA3 110 GiB ok e3d5s07 2 DA1 108 GiB ok e3d5s08 2 DA2 110 GiB ok e3d5s09 2 DA3 110 GiB ok e3d5s10 2 DA1 106 GiB ok e3d5s11 2 DA3 110 GiB ok e3d5s12log 2 LOG 186 GiB ok e4d1s07 2 DA1 106 GiB ok e4d1s08 2 DA2 110 GiB ok e4d1s09 2 DA3 110 GiB ok e4d1s10 2 DA1 106 GiB ok e4d1s11 2 DA2 110 GiB ok e4d1s12 2 DA3 110 GiB ok e4d2s07 2 DA1 106 GiB ok e4d2s08 2 DA2 110 GiB ok e4d2s09 2 DA3 110 GiB ok e4d2s10 2 DA1 106 GiB ok e4d2s11 2 DA2 110 GiB ok e4d2s12 2 DA3 110 GiB ok e4d3s07 2 DA1 108 GiB ok e4d3s08 2 DA2 110 GiB ok e4d3s09 2 DA3 110 GiB ok e4d3s10 2 DA1 108 GiB ok e4d3s11 2 DA2 110 GiB ok e4d3s12 2 DA3 110 GiB ok e4d4s07 2 DA1 106 GiB ok e4d4s08 2 DA2 110 GiB ok e4d4s09 2 DA3 110 GiB ok e4d4s10 2 DA1 106 GiB ok e4d4s11 2 DA2 110 GiB ok e4d4s12 2 DA3 110 GiB ok e4d5s07 2 DA1 106 GiB ok e4d5s08 2 DA2 110 GiB ok e4d5s09 2 DA3 110 GiB ok e4d5s10 2 DA1 106 GiB ok e4d5s11 2 DA3 110 GiB ok e5d1s07 2 DA1 108 GiB ok e5d1s08 2 DA2 110 GiB ok e5d1s09 2 DA3 110 GiB ok e5d1s10 2 DA1 106 GiB ok e5d1s11 2 DA2 110 GiB ok e5d1s12 2 DA3 110 GiB ok e5d2s07 2 DA1 108 GiB ok e5d2s08 2 DA2 110 GiB ok e5d2s09 2 DA3 110 GiB ok e5d2s10 2 DA1 108 GiB ok e5d2s11 2 DA2 110 GiB ok e5d2s12 2 DA3 110 GiB ok e5d3s07 2 DA1 108 GiB ok e5d3s08 2 DA2 110 GiB ok e5d3s09 2 DA3 110 GiB ok e5d3s10 2 DA1 106 GiB ok e5d3s11 2 DA2 110 GiB ok e5d3s12 2 DA3 110 GiB ok e5d4s07 2 DA1 108 GiB ok e5d4s08 2 DA2 110 GiB ok e5d4s09 2 DA3 110 GiB ok e5d4s10 2 DA1 108 GiB ok e5d4s11 2 DA2 110 GiB ok e5d4s12 2 DA3 110 GiB ok e5d5s07 2 DA1 108 GiB ok e5d5s08 2 DA2 110 GiB ok e5d5s09 2 DA3 110 GiB ok e5d5s10 2 DA1 106 GiB ok e5d5s11 2 DA2 110 GiB ok e6d1s07 2 DA1 108 GiB ok e6d1s08 2 DA2 110 GiB ok e6d1s09 2 DA3 110 GiB ok e6d1s10 2 DA1 108 GiB ok e6d1s11 2 DA2 110 GiB ok e6d1s12 2 DA3 110 GiB ok e6d2s07 2 DA1 106 GiB ok e6d2s08 2 DA2 110 GiB ok e6d2s09 2 DA3 108 GiB ok e6d2s10 2 DA1 108 GiB ok e6d2s11 2 DA2 108 GiB ok e6d2s12 2 DA3 108 GiB ok e6d3s07 2 DA1 106 GiB ok e6d3s08 2 DA2 108 GiB ok e6d3s09 2 DA3 108 GiB ok e6d3s10 2 DA1 106 GiB ok e6d3s11 2 DA2 108 GiB ok e6d3s12 2 DA3 108 GiB ok e6d4s07 2 DA1 106 GiB ok e6d4s08 2 DA2 108 GiB ok e6d4s09 2 DA3 108 GiB ok e6d4s10 2 DA1 108 GiB ok e6d4s11 2 DA2 108 GiB ok e6d4s12 2 DA3 110 GiB ok e6d5s07 2 DA1 108 GiB ok e6d5s08 2 DA2 110 GiB ok e6d5s09 2 DA3 110 GiB ok e6d5s10 2 DA1 108 GiB ok e6d5s11 2 DA2 110 GiB ok declustered checksum vdisk RAID code array vdisk size block size granularity remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ------- gss03b_logtip 3WayReplication LOG 128 MiB 1 MiB 512 logTip gss03b_loghome 4WayReplication DA1 40 GiB 1 MiB 512 log gss03b_MetaData_8M_3p_1 3WayReplication DA1 5121 GiB 1 MiB 32 KiB gss03b_MetaData_8M_3p_2 3WayReplication DA2 5121 GiB 1 MiB 32 KiB gss03b_MetaData_8M_3p_3 3WayReplication DA3 5121 GiB 1 MiB 32 KiB gss03b_Data_8M_3p_1 8+3p DA1 99 TiB 8 MiB 32 KiB gss03b_Data_8M_3p_2 8+3p DA2 99 TiB 8 MiB 32 KiB gss03b_Data_8M_3p_3 8+3p DA3 99 TiB 8 MiB 32 KiB active recovery group server servers ----------------------------------------------- ------- gss03b.ebi.ac.uk gss03b.ebi.ac.uk,gss03a.ebi.ac.uk -------------- next part -------------- === mmdiag: config === allowDeleteAclOnChmod 1 assertOnStructureError 0 atimeDeferredSeconds 86400 ! cipherList AUTHONLY ! clusterId 17987981184946329605 ! clusterName GSS.ebi.ac.uk consoleLogEvents 0 dataStructureDump 1 /tmp/mmfs dataStructureDumpOnRGOpenFailed 0 /tmp/mmfs dataStructureDumpOnSGPanic 0 /tmp/mmfs dataStructureDumpWait 60 dbBlockSizeThreshold -1 distributedTokenServer 1 dmapiAllowMountOnWindows 1 dmapiDataEventRetry 2 dmapiEnable 1 dmapiEventBuffers 64 dmapiEventTimeout -1 ! dmapiFileHandleSize 32 dmapiMountEvent all dmapiMountTimeout 60 dmapiSessionFailureTimeout 0 dmapiWorkerThreads 12 enableIPv6 0 enableLowspaceEvents 0 enableNFSCluster 0 enableStatUIDremap 0 enableTreeBasedQuotas 0 enableUIDremap 0 encryptionCryptoEngineLibName (NULL) encryptionCryptoEngineType CLiC enforceFilesetQuotaOnRoot 0 envVar ! failureDetectionTime 60 fgdlActivityTimeWindow 10 fgdlLeaveThreshold 1000 fineGrainDirLocks 1 FIPS1402mode 0 FleaDisableIntegrityChecks 0 FleaNumAsyncIOThreads 2 FleaNumLEBBuffers 256 FleaPreferredStripSize 0 ! flushedDataTarget 1024 ! flushedInodeTarget 1024 healthCheckInterval 10 idleSocketTimeout 3600 ignorePrefetchLUNCount 0 ignoreReplicaSpaceOnStat 0 ignoreReplicationForQuota 0 ignoreReplicationOnStatfs 0 ! ioHistorySize 65536 iscanPrefetchAggressiveness 2 leaseDMSTimeout -1 leaseDuration -1 leaseRecoveryWait 35 ! logBufferCount 20 ! logWrapAmountPct 2 ! logWrapThreads 128 lrocChecksum 0 lrocData 1 lrocDataMaxBufferSize 32768 lrocDataMaxFileSize 32768 lrocDataStubFileSize 0 lrocDeviceMaxSectorsKB 64 lrocDeviceNrRequests 1024 lrocDeviceQueueDepth 31 lrocDevices lrocDeviceScheduler deadline lrocDeviceSetParams 1 lrocDirectories 1 lrocInodes 1 ! maxAllocRegionsPerNode 32 ! maxBackgroundDeletionThreads 16 ! maxblocksize 16777216 ! maxBufferCleaners 1024 ! maxBufferDescs 2097152 maxDiskAddrBuffs -1 maxFcntlRangesPerFile 200 ! maxFileCleaners 1024 maxFileNameBytes 255 ! maxFilesToCache 12288 ! maxGeneralThreads 1280 ! maxInodeDeallocPrefetch 128 ! maxMBpS 16000 maxMissedPingTimeout 60 ! maxReceiverThreads 128 ! maxStatCache 512 maxTokenServers 128 minMissedPingTimeout 3 minQuorumNodes 1 ! minReleaseLevel 1340 ! myNodeConfigNumber 5 noSpaceEventInterval 120 nsdBufSpace (% of PagePool) 30 ! nsdClientCksumTypeLocal NsdCksum_Ck64 ! nsdClientCksumTypeRemote NsdCksum_Ck64 nsdDumpBuffersOnCksumError 0 nsd_cksum_capture ! nsdInlineWriteMax 32768 ! nsdMaxWorkerThreads 3072 ! nsdMinWorkerThreads 3072 nsdMultiQueue 256 nsdRAIDAllowTraditionalNSD 0 nsdRAIDAULogColocationLimit 131072 nsdRAIDBackgroundMinPct 5 ! nsdRAIDBlockDeviceMaxSectorsKB 4096 ! nsdRAIDBlockDeviceNrRequests 32 ! nsdRAIDBlockDeviceQueueDepth 16 ! nsdRAIDBlockDeviceScheduler deadline ! nsdRAIDBufferPoolSizePct (% of PagePool) 80 nsdRAIDBuffersPromotionThresholdPct 50 nsdRAIDCreateVdiskThreads 8 nsdRAIDDiskDiscoveryInterval 180 ! nsdRAIDEventLogToConsole all ! nsdRAIDFastWriteFSDataLimit 65536 ! nsdRAIDFastWriteFSMetadataLimit 262144 ! nsdRAIDFlusherBuffersLimitPct 80 ! nsdRAIDFlusherBuffersLowWatermarkPct 20 ! nsdRAIDFlusherFWLogHighWatermarkMB 1000 ! nsdRAIDFlusherFWLogLimitMB 5000 ! nsdRAIDFlusherThreadsHighWatermark 512 ! nsdRAIDFlusherThreadsLowWatermark 1 ! nsdRAIDFlusherTracksLimitPct 80 ! nsdRAIDFlusherTracksLowWatermarkPct 20 nsdRAIDForegroundMinPct 15 ! nsdRAIDMaxTransientStale2FT 1 ! nsdRAIDMaxTransientStale3FT 1 nsdRAIDMediumWriteLimitPct 50 nsdRAIDMultiQueue -1 ! nsdRAIDReconstructAggressiveness 1 ! nsdRAIDSmallBufferSize 262144 ! nsdRAIDSmallThreadRatio 2 ! nsdRAIDThreadsPerQueue 16 ! nsdRAIDTracks 131072 ! numaMemoryInterleave yes opensslLibName /usr/lib64/libssl.so.10:/usr/lib64/libssl.so.6:/usr/lib64/libssl.so.0.9.8:/lib64/libssl.so.6:libssl.so:libssl.so.0:libssl.so.4 ! pagepool 40802189312 pagepoolMaxPhysMemPct 75 prefetchAggressiveness 2 prefetchAggressivenessRead -1 prefetchAggressivenessWrite -1 ! prefetchPct 5 prefetchThreads 72 readReplicaPolicy default remoteMountTimeout 10 sharedMemLimit 0 sharedMemReservePct 15 sidAutoMapRangeLength 15000000 sidAutoMapRangeStart 15000000 ! socketMaxListenConnections 1500 socketRcvBufferSize 0 socketSndBufferSize 0 statCacheDirPct 10 subnets ! syncWorkerThreads 256 tiebreaker system tiebreakerDisks tokenMemLimit 536870912 treatOSyncLikeODSync 1 tscTcpPort 1191 ! tscWorkerPool 64 uidDomain GSS.ebi.ac.uk uidExpiration 36000 unmountOnDiskFail no useDIOXW 1 usePersistentReserve 0 verbsLibName libibverbs.so verbsPorts verbsRdma disable verbsRdmaCm disable verbsRdmaCmLibName librdmacm.so verbsRdmaMaxSendBytes 16777216 verbsRdmaMinBytes 8192 verbsRdmaQpRtrMinRnrTimer 18 verbsRdmaQpRtrPathMtu 2048 verbsRdmaQpRtrSl 0 verbsRdmaQpRtrSlDynamic 0 verbsRdmaQpRtrSlDynamicTimeout 10 verbsRdmaQpRtsRetryCnt 6 verbsRdmaQpRtsRnrRetry 6 verbsRdmaQpRtsTimeout 18 verbsRdmaSend 0 verbsRdmasPerConnection 8 verbsRdmasPerNode 0 verbsRdmaTimeout 18 verifyGpfsReady 0 ! worker1Threads 1024 ! worker3Threads 32 writebehindThreshold 524288 From oehmes at us.ibm.com Tue Oct 14 18:23:50 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 10:23:50 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: <543D51B6.3070602@ebi.ac.uk> References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> <543D51B6.3070602@ebi.ac.uk> Message-ID: you basically run GSS 1.0 code , while in the current version is GSS 2.0 (which replaced Version 1.5 2 month ago) GSS 1.5 and 2.0 have several enhancements in this space so i strongly encourage you to upgrade your systems. if you can specify a bit what your workload is there might also be additional knobs we can turn to change the behavior. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM: > From: Salvatore Di Nardo > To: gpfsug main discussion list > Date: 10/14/2014 09:40 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > > Thanks in advance for your help. > > We have 6 RG: > recovery group vdisks vdisks servers > ------------------ ----------- ------ ------- > gss01a 4 8 gss01a.ebi.ac.uk,gss01b.ebi.ac.uk > gss01b 4 8 gss01b.ebi.ac.uk,gss01a.ebi.ac.uk > gss02a 4 8 gss02a.ebi.ac.uk,gss02b.ebi.ac.uk > gss02b 4 8 gss02b.ebi.ac.uk,gss02a.ebi.ac.uk > gss03a 4 8 gss03a.ebi.ac.uk,gss03b.ebi.ac.uk > gss03b 4 8 gss03b.ebi.ac.uk,gss03a.ebi.ac.uk > > Check the attached file for RG details. > Following mmlsconfig: > [root at gss01a ~]# mmlsconfig > Configuration data for cluster GSS.ebi.ac.uk: > --------------------------------------------- > myNodeConfigNumber 1 > clusterName GSS.ebi.ac.uk > clusterId 17987981184946329605 > autoload no > dmapiFileHandleSize 32 > minReleaseLevel 3.5.0.11 > [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] > pagepool 38g > nsdRAIDBufferPoolSizePct 80 > maxBufferDescs 2m > numaMemoryInterleave yes > prefetchPct 5 > maxblocksize 16m > nsdRAIDTracks 128k > ioHistorySize 64k > nsdRAIDSmallBufferSize 256k > nsdMaxWorkerThreads 3k > nsdMinWorkerThreads 3k > nsdRAIDSmallThreadRatio 2 > nsdRAIDThreadsPerQueue 16 > nsdClientCksumTypeLocal ck64 > nsdClientCksumTypeRemote ck64 > nsdRAIDEventLogToConsole all > nsdRAIDFastWriteFSDataLimit 64k > nsdRAIDFastWriteFSMetadataLimit 256k > nsdRAIDReconstructAggressiveness 1 > nsdRAIDFlusherBuffersLowWatermarkPct 20 > nsdRAIDFlusherBuffersLimitPct 80 > nsdRAIDFlusherTracksLowWatermarkPct 20 > nsdRAIDFlusherTracksLimitPct 80 > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > nsdRAIDFlusherFWLogLimitMB 5000 > nsdRAIDFlusherThreadsLowWatermark 1 > nsdRAIDFlusherThreadsHighWatermark 512 > nsdRAIDBlockDeviceMaxSectorsKB 4096 > nsdRAIDBlockDeviceNrRequests 32 > nsdRAIDBlockDeviceQueueDepth 16 > nsdRAIDBlockDeviceScheduler deadline > nsdRAIDMaxTransientStale2FT 1 > nsdRAIDMaxTransientStale3FT 1 > syncWorkerThreads 256 > tscWorkerPool 64 > nsdInlineWriteMax 32k > maxFilesToCache 12k > maxStatCache 512 > maxGeneralThreads 1280 > flushedDataTarget 1024 > flushedInodeTarget 1024 > maxFileCleaners 1024 > maxBufferCleaners 1024 > logBufferCount 20 > logWrapAmountPct 2 > logWrapThreads 128 > maxAllocRegionsPerNode 32 > maxBackgroundDeletionThreads 16 > maxInodeDeallocPrefetch 128 > maxMBpS 16000 > maxReceiverThreads 128 > worker1Threads 1024 > worker3Threads 32 > [common] > cipherList AUTHONLY > socketMaxListenConnections 1500 > failureDetectionTime 60 > [common] > adminMode central > > File systems in cluster GSS.ebi.ac.uk: > -------------------------------------- > /dev/gpfs1 > For more configuration paramenters i also attached a file with the > complete output of mmdiag --config. > > > and mmlsfs: > > File system attributes for /dev/gpfs1: > ====================================== > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 32768 Minimum fragment size > in bytes (system pool) > 262144 Minimum fragment size > in bytes (other pools) > -i 512 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 2 Default number of > metadata replicas > -M 2 Maximum number of > metadata replicas > -r 1 Default number of data replicas > -R 2 Maximum number of data replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in effect > -k all ACL semantics in effect > -n 1000 Estimated number of > nodes that will mount file system > -B 1048576 Block size (system pool) > 8388608 Block size (other pools) > -Q user;group;fileset Quotas enforced > user;group;fileset Default quotas enabled > --filesetdf no Fileset df enabled? > -V 13.23 (3.5.0.7) File system version > --create-time Tue Mar 18 16:01:24 2014 File system creation time > -u yes Support for large LUNs? > -z no Is DMAPI enabled? > -L 4194304 Logfile size > -E yes Exact mtime mount option > -S yes Suppress atime mount option > -K whenpossible Strict replica allocation option > --fastea yes Fast external attributes enabled? > --inode-limit 134217728 Maximum number of inodes > -P system;data Disk storage pools in file system > -d > gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; > -d > gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; > -d > gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; > -d > gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; > -d > gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 > Disks in file system > --perfileset-quota no Per-fileset quota enforcement > -A yes Automatic mount option > -o none Additional mount options > -T /gpfs1 Default mount point > --mount-priority 0 Mount priority > > > Regards, > Salvatore > > On 14/10/14 17:22, Sven Oehme wrote: > your GSS code version is very backlevel. > > can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk > as well as mmlsconfig and mmlsfs all > > thx. Sven > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug-discuss at gpfsug.org > Date: 10/14/2014 08:23 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > > > > > On 14/10/14 15:51, Sven Oehme wrote: > it means there is contention on inserting data into the fast write > log on the GSS Node, which could be config or workload related > what GSS code version are you running > [root at ebi5-251 ~]# mmdiag --version > > === mmdiag: version === > Current GPFS build: "3.5.0-11 efix1 (888041)". > Built on Jul 9 2013 at 18:03:32 > Running 6 days 2 hours 10 minutes 35 secs > > > > and how are the nodes connected with each other (Ethernet or IB) ? > ethernet. they use the same bonding (4x10Gb/s) where the data is > passing. We don't have admin dedicated network > > [root at gss03a ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager > > > Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different > subnet because of datacenter constraints ( They are not physically > in the same row, and due to network constraints was not possible to > put them in the same subnet). The packets are routed, but should not > be a problem as there is 160Gb/s bandwidth between them. > > Regards, > Salvatore > > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo > To: gpfsug main discussion list > Date: 10/14/2014 07:40 AM > Subject: [gpfsug-discuss] wait for permission to append to log > Sent by: gpfsug-discuss-bounces at gpfsug.org > > > > hello all, > could someone explain me the meaning of those waiters? > > gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > > Does it means that the vdisk logs are struggling? > > Regards, > Salvatore > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/ > IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Tue Oct 14 18:32:50 2014 From: zgiles at gmail.com (Zachary Giles) Date: Tue, 14 Oct 2014 13:32:50 -0400 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> <543D51B6.3070602@ebi.ac.uk> Message-ID: Except that AFAIK no one has published how to update GSS or where the update code is.. All I've heard is "contact your sales rep". Any pointers? On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme wrote: > you basically run GSS 1.0 code , while in the current version is GSS 2.0 > (which replaced Version 1.5 2 month ago) > > GSS 1.5 and 2.0 have several enhancements in this space so i strongly > encourage you to upgrade your systems. > > if you can specify a bit what your workload is there might also be > additional knobs we can turn to change the behavior. > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM: > >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 09:40 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> Thanks in advance for your help. >> >> We have 6 RG: > >> recovery group vdisks vdisks servers >> ------------------ ----------- ------ ------- >> gss01a 4 8 >> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk >> gss01b 4 8 >> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk >> gss02a 4 8 >> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk >> gss02b 4 8 >> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk >> gss03a 4 8 >> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk >> gss03b 4 8 >> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk >> >> Check the attached file for RG details. >> Following mmlsconfig: > >> [root at gss01a ~]# mmlsconfig >> Configuration data for cluster GSS.ebi.ac.uk: >> --------------------------------------------- >> myNodeConfigNumber 1 >> clusterName GSS.ebi.ac.uk >> clusterId 17987981184946329605 >> autoload no >> dmapiFileHandleSize 32 >> minReleaseLevel 3.5.0.11 >> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] >> pagepool 38g >> nsdRAIDBufferPoolSizePct 80 >> maxBufferDescs 2m >> numaMemoryInterleave yes >> prefetchPct 5 >> maxblocksize 16m >> nsdRAIDTracks 128k >> ioHistorySize 64k >> nsdRAIDSmallBufferSize 256k >> nsdMaxWorkerThreads 3k >> nsdMinWorkerThreads 3k >> nsdRAIDSmallThreadRatio 2 >> nsdRAIDThreadsPerQueue 16 >> nsdClientCksumTypeLocal ck64 >> nsdClientCksumTypeRemote ck64 >> nsdRAIDEventLogToConsole all >> nsdRAIDFastWriteFSDataLimit 64k >> nsdRAIDFastWriteFSMetadataLimit 256k >> nsdRAIDReconstructAggressiveness 1 >> nsdRAIDFlusherBuffersLowWatermarkPct 20 >> nsdRAIDFlusherBuffersLimitPct 80 >> nsdRAIDFlusherTracksLowWatermarkPct 20 >> nsdRAIDFlusherTracksLimitPct 80 >> nsdRAIDFlusherFWLogHighWatermarkMB 1000 >> nsdRAIDFlusherFWLogLimitMB 5000 >> nsdRAIDFlusherThreadsLowWatermark 1 >> nsdRAIDFlusherThreadsHighWatermark 512 >> nsdRAIDBlockDeviceMaxSectorsKB 4096 >> nsdRAIDBlockDeviceNrRequests 32 >> nsdRAIDBlockDeviceQueueDepth 16 >> nsdRAIDBlockDeviceScheduler deadline >> nsdRAIDMaxTransientStale2FT 1 >> nsdRAIDMaxTransientStale3FT 1 >> syncWorkerThreads 256 >> tscWorkerPool 64 >> nsdInlineWriteMax 32k >> maxFilesToCache 12k >> maxStatCache 512 >> maxGeneralThreads 1280 >> flushedDataTarget 1024 >> flushedInodeTarget 1024 >> maxFileCleaners 1024 >> maxBufferCleaners 1024 >> logBufferCount 20 >> logWrapAmountPct 2 >> logWrapThreads 128 >> maxAllocRegionsPerNode 32 >> maxBackgroundDeletionThreads 16 >> maxInodeDeallocPrefetch 128 >> maxMBpS 16000 >> maxReceiverThreads 128 >> worker1Threads 1024 >> worker3Threads 32 >> [common] >> cipherList AUTHONLY >> socketMaxListenConnections 1500 >> failureDetectionTime 60 >> [common] >> adminMode central >> >> File systems in cluster GSS.ebi.ac.uk: >> -------------------------------------- >> /dev/gpfs1 > >> For more configuration paramenters i also attached a file with the >> complete output of mmdiag --config. >> >> >> and mmlsfs: >> >> File system attributes for /dev/gpfs1: >> ====================================== >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 32768 Minimum fragment size >> in bytes (system pool) >> 262144 Minimum fragment size >> in bytes (other pools) >> -i 512 Inode size in bytes >> -I 32768 Indirect block size in bytes >> -m 2 Default number of >> metadata replicas >> -M 2 Maximum number of >> metadata replicas >> -r 1 Default number of data >> replicas >> -R 2 Maximum number of data >> replicas >> -j scatter Block allocation type >> -D nfs4 File locking semantics in >> effect >> -k all ACL semantics in effect >> -n 1000 Estimated number of >> nodes that will mount file system >> -B 1048576 Block size (system pool) >> 8388608 Block size (other pools) >> -Q user;group;fileset Quotas enforced >> user;group;fileset Default quotas enabled >> --filesetdf no Fileset df enabled? >> -V 13.23 (3.5.0.7) File system version >> --create-time Tue Mar 18 16:01:24 2014 File system creation time >> -u yes Support for large LUNs? >> -z no Is DMAPI enabled? >> -L 4194304 Logfile size >> -E yes Exact mtime mount option >> -S yes Suppress atime mount option >> -K whenpossible Strict replica allocation >> option >> --fastea yes Fast external attributes >> enabled? >> --inode-limit 134217728 Maximum number of inodes >> -P system;data Disk storage pools in file >> system >> -d >> >> gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; >> -d >> >> gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; >> -d >> >> gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; >> -d >> >> gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; >> -d >> >> gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 >> Disks in file system >> --perfileset-quota no Per-fileset quota enforcement >> -A yes Automatic mount option >> -o none Additional mount options >> -T /gpfs1 Default mount point >> --mount-priority 0 Mount priority >> >> >> Regards, >> Salvatore >> > >> On 14/10/14 17:22, Sven Oehme wrote: >> your GSS code version is very backlevel. >> >> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk >> as well as mmlsconfig and mmlsfs all >> >> thx. Sven >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug-discuss at gpfsug.org >> Date: 10/14/2014 08:23 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> >> On 14/10/14 15:51, Sven Oehme wrote: >> it means there is contention on inserting data into the fast write >> log on the GSS Node, which could be config or workload related >> what GSS code version are you running >> [root at ebi5-251 ~]# mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "3.5.0-11 efix1 (888041)". >> Built on Jul 9 2013 at 18:03:32 >> Running 6 days 2 hours 10 minutes 35 secs >> >> >> >> and how are the nodes connected with each other (Ethernet or IB) ? >> ethernet. they use the same bonding (4x10Gb/s) where the data is >> passing. We don't have admin dedicated network >> >> [root at gss03a ~]# mmlscluster >> >> GPFS cluster information >> ======================== >> GPFS cluster name: GSS.ebi.ac.uk >> GPFS cluster id: 17987981184946329605 >> GPFS UID domain: GSS.ebi.ac.uk >> Remote shell command: /usr/bin/ssh >> Remote file copy command: /usr/bin/scp >> >> GPFS cluster configuration servers: >> ----------------------------------- >> Primary server: gss01a.ebi.ac.uk >> Secondary server: gss02b.ebi.ac.uk >> >> Node Daemon node name IP address Admin node name Designation >> ----------------------------------------------------------------------- >> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >> >> >> Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different >> subnet because of datacenter constraints ( They are not physically >> in the same row, and due to network constraints was not possible to >> put them in the same subnet). The packets are routed, but should not >> be a problem as there is 160Gb/s bandwidth between them. >> >> Regards, >> Salvatore >> >> >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 07:40 AM >> Subject: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> hello all, >> could someone explain me the meaning of those waiters? >> >> gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> >> Does it means that the vdisk logs are struggling? >> >> Regards, >> Salvatore >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/ >> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com From oehmes at us.ibm.com Tue Oct 14 18:38:10 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Tue, 14 Oct 2014 10:38:10 -0700 Subject: [gpfsug-discuss] wait for permission to append to log In-Reply-To: References: <543D35A7.7080800@ebi.ac.uk> <543D3FD5.1060705@ebi.ac.uk> <543D51B6.3070602@ebi.ac.uk> Message-ID: i personally don't know, i am in GPFS Research, not in support :-) but have you tried to contact your sales rep ? if you are not successful with that, shoot me a direct email with details about your company name, country and customer number and i try to get you somebody to help. thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Zachary Giles To: gpfsug main discussion list Date: 10/14/2014 10:33 AM Subject: Re: [gpfsug-discuss] wait for permission to append to log Sent by: gpfsug-discuss-bounces at gpfsug.org Except that AFAIK no one has published how to update GSS or where the update code is.. All I've heard is "contact your sales rep". Any pointers? On Tue, Oct 14, 2014 at 1:23 PM, Sven Oehme wrote: > you basically run GSS 1.0 code , while in the current version is GSS 2.0 > (which replaced Version 1.5 2 month ago) > > GSS 1.5 and 2.0 have several enhancements in this space so i strongly > encourage you to upgrade your systems. > > if you can specify a bit what your workload is there might also be > additional knobs we can turn to change the behavior. > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: oehmes at us.ibm.com > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > gpfsug-discuss-bounces at gpfsug.org wrote on 10/14/2014 09:39:18 AM: > >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 09:40 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> Thanks in advance for your help. >> >> We have 6 RG: > >> recovery group vdisks vdisks servers >> ------------------ ----------- ------ ------- >> gss01a 4 8 >> gss01a.ebi.ac.uk,gss01b.ebi.ac.uk >> gss01b 4 8 >> gss01b.ebi.ac.uk,gss01a.ebi.ac.uk >> gss02a 4 8 >> gss02a.ebi.ac.uk,gss02b.ebi.ac.uk >> gss02b 4 8 >> gss02b.ebi.ac.uk,gss02a.ebi.ac.uk >> gss03a 4 8 >> gss03a.ebi.ac.uk,gss03b.ebi.ac.uk >> gss03b 4 8 >> gss03b.ebi.ac.uk,gss03a.ebi.ac.uk >> >> Check the attached file for RG details. >> Following mmlsconfig: > >> [root at gss01a ~]# mmlsconfig >> Configuration data for cluster GSS.ebi.ac.uk: >> --------------------------------------------- >> myNodeConfigNumber 1 >> clusterName GSS.ebi.ac.uk >> clusterId 17987981184946329605 >> autoload no >> dmapiFileHandleSize 32 >> minReleaseLevel 3.5.0.11 >> [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] >> pagepool 38g >> nsdRAIDBufferPoolSizePct 80 >> maxBufferDescs 2m >> numaMemoryInterleave yes >> prefetchPct 5 >> maxblocksize 16m >> nsdRAIDTracks 128k >> ioHistorySize 64k >> nsdRAIDSmallBufferSize 256k >> nsdMaxWorkerThreads 3k >> nsdMinWorkerThreads 3k >> nsdRAIDSmallThreadRatio 2 >> nsdRAIDThreadsPerQueue 16 >> nsdClientCksumTypeLocal ck64 >> nsdClientCksumTypeRemote ck64 >> nsdRAIDEventLogToConsole all >> nsdRAIDFastWriteFSDataLimit 64k >> nsdRAIDFastWriteFSMetadataLimit 256k >> nsdRAIDReconstructAggressiveness 1 >> nsdRAIDFlusherBuffersLowWatermarkPct 20 >> nsdRAIDFlusherBuffersLimitPct 80 >> nsdRAIDFlusherTracksLowWatermarkPct 20 >> nsdRAIDFlusherTracksLimitPct 80 >> nsdRAIDFlusherFWLogHighWatermarkMB 1000 >> nsdRAIDFlusherFWLogLimitMB 5000 >> nsdRAIDFlusherThreadsLowWatermark 1 >> nsdRAIDFlusherThreadsHighWatermark 512 >> nsdRAIDBlockDeviceMaxSectorsKB 4096 >> nsdRAIDBlockDeviceNrRequests 32 >> nsdRAIDBlockDeviceQueueDepth 16 >> nsdRAIDBlockDeviceScheduler deadline >> nsdRAIDMaxTransientStale2FT 1 >> nsdRAIDMaxTransientStale3FT 1 >> syncWorkerThreads 256 >> tscWorkerPool 64 >> nsdInlineWriteMax 32k >> maxFilesToCache 12k >> maxStatCache 512 >> maxGeneralThreads 1280 >> flushedDataTarget 1024 >> flushedInodeTarget 1024 >> maxFileCleaners 1024 >> maxBufferCleaners 1024 >> logBufferCount 20 >> logWrapAmountPct 2 >> logWrapThreads 128 >> maxAllocRegionsPerNode 32 >> maxBackgroundDeletionThreads 16 >> maxInodeDeallocPrefetch 128 >> maxMBpS 16000 >> maxReceiverThreads 128 >> worker1Threads 1024 >> worker3Threads 32 >> [common] >> cipherList AUTHONLY >> socketMaxListenConnections 1500 >> failureDetectionTime 60 >> [common] >> adminMode central >> >> File systems in cluster GSS.ebi.ac.uk: >> -------------------------------------- >> /dev/gpfs1 > >> For more configuration paramenters i also attached a file with the >> complete output of mmdiag --config. >> >> >> and mmlsfs: >> >> File system attributes for /dev/gpfs1: >> ====================================== >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 32768 Minimum fragment size >> in bytes (system pool) >> 262144 Minimum fragment size >> in bytes (other pools) >> -i 512 Inode size in bytes >> -I 32768 Indirect block size in bytes >> -m 2 Default number of >> metadata replicas >> -M 2 Maximum number of >> metadata replicas >> -r 1 Default number of data >> replicas >> -R 2 Maximum number of data >> replicas >> -j scatter Block allocation type >> -D nfs4 File locking semantics in >> effect >> -k all ACL semantics in effect >> -n 1000 Estimated number of >> nodes that will mount file system >> -B 1048576 Block size (system pool) >> 8388608 Block size (other pools) >> -Q user;group;fileset Quotas enforced >> user;group;fileset Default quotas enabled >> --filesetdf no Fileset df enabled? >> -V 13.23 (3.5.0.7) File system version >> --create-time Tue Mar 18 16:01:24 2014 File system creation time >> -u yes Support for large LUNs? >> -z no Is DMAPI enabled? >> -L 4194304 Logfile size >> -E yes Exact mtime mount option >> -S yes Suppress atime mount option >> -K whenpossible Strict replica allocation >> option >> --fastea yes Fast external attributes >> enabled? >> --inode-limit 134217728 Maximum number of inodes >> -P system;data Disk storage pools in file >> system >> -d >> >> gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; >> -d >> >> gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; >> -d >> >> gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; >> -d >> >> gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; >> -d >> >> gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 >> Disks in file system >> --perfileset-quota no Per-fileset quota enforcement >> -A yes Automatic mount option >> -o none Additional mount options >> -T /gpfs1 Default mount point >> --mount-priority 0 Mount priority >> >> >> Regards, >> Salvatore >> > >> On 14/10/14 17:22, Sven Oehme wrote: >> your GSS code version is very backlevel. >> >> can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk >> as well as mmlsconfig and mmlsfs all >> >> thx. Sven >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug-discuss at gpfsug.org >> Date: 10/14/2014 08:23 AM >> Subject: Re: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> >> On 14/10/14 15:51, Sven Oehme wrote: >> it means there is contention on inserting data into the fast write >> log on the GSS Node, which could be config or workload related >> what GSS code version are you running >> [root at ebi5-251 ~]# mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "3.5.0-11 efix1 (888041)". >> Built on Jul 9 2013 at 18:03:32 >> Running 6 days 2 hours 10 minutes 35 secs >> >> >> >> and how are the nodes connected with each other (Ethernet or IB) ? >> ethernet. they use the same bonding (4x10Gb/s) where the data is >> passing. We don't have admin dedicated network >> >> [root at gss03a ~]# mmlscluster >> >> GPFS cluster information >> ======================== >> GPFS cluster name: GSS.ebi.ac.uk >> GPFS cluster id: 17987981184946329605 >> GPFS UID domain: GSS.ebi.ac.uk >> Remote shell command: /usr/bin/ssh >> Remote file copy command: /usr/bin/scp >> >> GPFS cluster configuration servers: >> ----------------------------------- >> Primary server: gss01a.ebi.ac.uk >> Secondary server: gss02b.ebi.ac.uk >> >> Node Daemon node name IP address Admin node name Designation >> ----------------------------------------------------------------------- >> 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager >> 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager >> 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager >> 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager >> 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager >> 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager >> >> >> Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different >> subnet because of datacenter constraints ( They are not physically >> in the same row, and due to network constraints was not possible to >> put them in the same subnet). The packets are routed, but should not >> be a problem as there is 160Gb/s bandwidth between them. >> >> Regards, >> Salvatore >> >> >> >> ------------------------------------------ >> Sven Oehme >> Scalable Storage Research >> email: oehmes at us.ibm.com >> Phone: +1 (408) 824-8904 >> IBM Almaden Research Lab >> ------------------------------------------ >> >> >> >> From: Salvatore Di Nardo >> To: gpfsug main discussion list >> Date: 10/14/2014 07:40 AM >> Subject: [gpfsug-discuss] wait for permission to append to log >> Sent by: gpfsug-discuss-bounces at gpfsug.org >> >> >> >> hello all, >> could someone explain me the meaning of those waiters? >> >> gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, >> NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) >> (VdiskLogAppendCondvar), reason 'wait for permission to append to log' >> >> Does it means that the vdisk logs are struggling? >> >> Regards, >> Salvatore >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/ >> IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmcneil at kingston.ac.uk Wed Oct 15 14:01:49 2014 From: tmcneil at kingston.ac.uk (Mcneil, Tony) Date: Wed, 15 Oct 2014 14:01:49 +0100 Subject: [gpfsug-discuss] Hello Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705@KUMBX.kuds.kingston.ac.uk> Hello All, Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. So far we have migrated all our students and approximately 60% of our staff. Looking forward to receiving some interesting posts from the forum. Regards Tony. Tony McNeil Senior Systems Support Analyst, Infrastructure, Information Services ______________________________________________________________________________ T Internal: 62852 T 020 8417 2852 Kingston University London Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. Please consider the environment before printing this email. This email has been scanned for all viruses by the MessageLabs Email Security System. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bill.Pappas at STJUDE.ORG Thu Oct 16 14:49:57 2014 From: Bill.Pappas at STJUDE.ORG (Pappas, Bill) Date: Thu, 16 Oct 2014 08:49:57 -0500 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> Are you using ctdb? Thanks, Bill Pappas - Manager - Enterprise Storage Group Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 bill.pappas at stjude.org (901) 595-4549 office www.stjude.org Email disclaimer: http://www.stjude.org/emaildisclaimer -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org Sent: Thursday, October 16, 2014 6:00 AM To: gpfsug-discuss at gpfsug.org Subject: gpfsug-discuss Digest, Vol 33, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Hello (Mcneil, Tony) ---------------------------------------------------------------------- Message: 1 Date: Wed, 15 Oct 2014 14:01:49 +0100 From: "Mcneil, Tony" To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Hello Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> Content-Type: text/plain; charset="us-ascii" Hello All, Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. So far we have migrated all our students and approximately 60% of our staff. Looking forward to receiving some interesting posts from the forum. Regards Tony. Tony McNeil Senior Systems Support Analyst, Infrastructure, Information Services ______________________________________________________________________________ T Internal: 62852 T 020 8417 2852 Kingston University London Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. Please consider the environment before printing this email. This email has been scanned for all viruses by the MessageLabs Email Security System. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 33, Issue 19 ********************************************** From tmcneil at kingston.ac.uk Fri Oct 17 06:25:00 2014 From: tmcneil at kingston.ac.uk (Mcneil, Tony) Date: Fri, 17 Oct 2014 06:25:00 +0100 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) In-Reply-To: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> Hi Bill, Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel Regards Tony. -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill Sent: 16 October 2014 14:50 To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Hello (Mcneil, Tony) Are you using ctdb? Thanks, Bill Pappas - Manager - Enterprise Storage Group Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 bill.pappas at stjude.org (901) 595-4549 office www.stjude.org Email disclaimer: http://www.stjude.org/emaildisclaimer -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org Sent: Thursday, October 16, 2014 6:00 AM To: gpfsug-discuss at gpfsug.org Subject: gpfsug-discuss Digest, Vol 33, Issue 19 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Hello (Mcneil, Tony) ---------------------------------------------------------------------- Message: 1 Date: Wed, 15 Oct 2014 14:01:49 +0100 From: "Mcneil, Tony" To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Hello Message-ID: <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> Content-Type: text/plain; charset="us-ascii" Hello All, Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. So far we have migrated all our students and approximately 60% of our staff. Looking forward to receiving some interesting posts from the forum. Regards Tony. Tony McNeil Senior Systems Support Analyst, Infrastructure, Information Services ______________________________________________________________________________ T Internal: 62852 T 020 8417 2852 Kingston University London Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. Please consider the environment before printing this email. This email has been scanned for all viruses by the MessageLabs Email Security System. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 33, Issue 19 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This email has been scanned for all viruses by the MessageLabs Email Security System. This email has been scanned for all viruses by the MessageLabs Email Security System. From chair at gpfsug.org Tue Oct 21 11:42:10 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Tue, 21 Oct 2014 11:42:10 +0100 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) In-Reply-To: <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> Message-ID: <54463882.7070009@gpfsug.org> I noticed that v7000 Unified is using CTDB v3.3. What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. Is that a question for Amitay? On 17/10/14 06:25, Mcneil, Tony wrote: > Hi Bill, > > Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel > > Regards > Tony. > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill > Sent: 16 October 2014 14:50 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] Hello (Mcneil, Tony) > > Are you using ctdb? > > Thanks, > Bill Pappas - > Manager - Enterprise Storage Group > Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital > 262 Danny Thomas Place, Mail Stop 504 > Memphis, TN 38105 > bill.pappas at stjude.org > (901) 595-4549 office > www.stjude.org > Email disclaimer: http://www.stjude.org/emaildisclaimer > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org > Sent: Thursday, October 16, 2014 6:00 AM > To: gpfsug-discuss at gpfsug.org > Subject: gpfsug-discuss Digest, Vol 33, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Hello (Mcneil, Tony) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 15 Oct 2014 14:01:49 +0100 > From: "Mcneil, Tony" > To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Hello > Message-ID: > <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> > > Content-Type: text/plain; charset="us-ascii" > > Hello All, > > Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' > > We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. > > The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. > > So far we have migrated all our students and approximately 60% of our staff. > > Looking forward to receiving some interesting posts from the forum. > > Regards > Tony. > > Tony McNeil > Senior Systems Support Analyst, Infrastructure, Information Services > ______________________________________________________________________________ > > T Internal: 62852 > T 020 8417 2852 > > Kingston University London > Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk > > Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. > Please consider the environment before printing this email. > > > This email has been scanned for all viruses by the MessageLabs Email Security System. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 33, Issue 19 > ********************************************** > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From rtriendl at ddn.com Tue Oct 21 11:53:37 2014 From: rtriendl at ddn.com (Robert Triendl) Date: Tue, 21 Oct 2014 10:53:37 +0000 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) In-Reply-To: <54463882.7070009@gpfsug.org> References: <8172D639BA76A14AA5C9DE7E13E0CEBE7366265EDB@10.stjude.org> <41FE47CD792F16439D1192AA1087D0E801923DBE67AD@KUMBX.kuds.kingston.ac.uk> <54463882.7070009@gpfsug.org> Message-ID: Yes, I think so? I am;-) On 2014/10/21, at 19:42, Jez Tucker (Chair) wrote: > I noticed that v7000 Unified is using CTDB v3.3. > What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. > Is that a question for Amitay? > > On 17/10/14 06:25, Mcneil, Tony wrote: >> Hi Bill, >> >> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel >> >> Regards >> Tony. >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill >> Sent: 16 October 2014 14:50 >> To: gpfsug-discuss at gpfsug.org >> Subject: [gpfsug-discuss] Hello (Mcneil, Tony) >> >> Are you using ctdb? >> >> Thanks, >> Bill Pappas - >> Manager - Enterprise Storage Group >> Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital >> 262 Danny Thomas Place, Mail Stop 504 >> Memphis, TN 38105 >> bill.pappas at stjude.org >> (901) 595-4549 office >> www.stjude.org >> Email disclaimer: http://www.stjude.org/emaildisclaimer >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org >> Sent: Thursday, October 16, 2014 6:00 AM >> To: gpfsug-discuss at gpfsug.org >> Subject: gpfsug-discuss Digest, Vol 33, Issue 19 >> >> Send gpfsug-discuss mailing list submissions to >> gpfsug-discuss at gpfsug.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> or, via email, send a message with subject or body 'help' to >> gpfsug-discuss-request at gpfsug.org >> >> You can reach the person managing the list at >> gpfsug-discuss-owner at gpfsug.org >> >> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." >> >> >> Today's Topics: >> >> 1. Hello (Mcneil, Tony) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Wed, 15 Oct 2014 14:01:49 +0100 >> From: "Mcneil, Tony" >> To: "gpfsug-discuss at gpfsug.org" >> Subject: [gpfsug-discuss] Hello >> Message-ID: >> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.uk> >> >> Content-Type: text/plain; charset="us-ascii" >> >> Hello All, >> >> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' >> >> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. >> >> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. >> >> So far we have migrated all our students and approximately 60% of our staff. >> >> Looking forward to receiving some interesting posts from the forum. >> >> Regards >> Tony. >> >> Tony McNeil >> Senior Systems Support Analyst, Infrastructure, Information Services >> ______________________________________________________________________________ >> >> T Internal: 62852 >> T 020 8417 2852 >> >> Kingston University London >> Penrhyn Road, Kingston upon Thames KT1 2EE www.kingston.ac.uk >> >> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. >> Please consider the environment before printing this email. >> >> >> This email has been scanned for all viruses by the MessageLabs Email Security System. >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> End of gpfsug-discuss Digest, Vol 33, Issue 19 >> ********************************************** >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Bill.Pappas at STJUDE.ORG Tue Oct 21 16:59:08 2014 From: Bill.Pappas at STJUDE.ORG (Pappas, Bill) Date: Tue, 21 Oct 2014 10:59:08 -0500 Subject: [gpfsug-discuss] Hello (Mcneil, Tony) (Jez Tucker (Chair)) Message-ID: <8172D639BA76A14AA5C9DE7E13E0CEBE73664E3E8D@10.stjude.org> >>Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb. 1. What procedure did you follow to configure ctdb/samba to work? Was it hard? Could you show us, if permitted? 2. Are you also controlling NFS via ctdb? 3. Are you managing multiple IP devices? Eg: ethX0 for VLAN104 and ethX1 for VLAN103 (<- for fast 10GbE users). We use SoNAS and v7000 for most NAS and they use ctdb. Their ctdb results are overall 'ok', with a few bumps here or there. Not too many ctdb PMRs over the 3-4 years on SoNAS. We want to set up ctdb for a GPFS AFM cache that services GPSF data clients. That cache writes to an AFM home (SoNAS). This cache also uses Samba and NFS for lightweight (as in IO, though still important) file access on this cache. It does not use ctdb, but I know it should. I would love to learn how you set your environment up even if it may be a little (or a lot) different. Thanks, Bill Pappas - Manager - Enterprise Storage Group Sr. Enterprise Network Storage Architect Information Sciences Department / Enterprise Informatics Division St. Jude Children's Research Hospital 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 bill.pappas at stjude.org (901) 595-4549 office www.stjude.org Email disclaimer: http://www.stjude.org/emaildisclaimer -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of gpfsug-discuss-request at gpfsug.org Sent: Tuesday, October 21, 2014 6:00 AM To: gpfsug-discuss at gpfsug.org Subject: gpfsug-discuss Digest, Vol 33, Issue 21 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at gpfsug.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at gpfsug.org You can reach the person managing the list at gpfsug-discuss-owner at gpfsug.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Hello (Mcneil, Tony) (Jez Tucker (Chair)) 2. Re: Hello (Mcneil, Tony) (Robert Triendl) ---------------------------------------------------------------------- Message: 1 Date: Tue, 21 Oct 2014 11:42:10 +0100 From: "Jez Tucker (Chair)" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony) Message-ID: <54463882.7070009 at gpfsug.org> Content-Type: text/plain; charset=windows-1252; format=flowed I noticed that v7000 Unified is using CTDB v3.3. What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. Is that a question for Amitay? On 17/10/14 06:25, Mcneil, Tony wrote: > Hi Bill, > > Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel > > Regards > Tony. > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill > Sent: 16 October 2014 14:50 > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] Hello (Mcneil, Tony) > > Are you using ctdb? > > Thanks, > Bill Pappas - > Manager - Enterprise Storage Group > Sr. Enterprise Network Storage Architect Information Sciences > Department / Enterprise Informatics Division St. Jude Children's > Research Hospital > 262 Danny Thomas Place, Mail Stop 504 > Memphis, TN 38105 > bill.pappas at stjude.org > (901) 595-4549 office > www.stjude.org > Email disclaimer: http://www.stjude.org/emaildisclaimer > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of > gpfsug-discuss-request at gpfsug.org > Sent: Thursday, October 16, 2014 6:00 AM > To: gpfsug-discuss at gpfsug.org > Subject: gpfsug-discuss Digest, Vol 33, Issue 19 > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at gpfsug.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at gpfsug.org > > You can reach the person managing the list at > gpfsug-discuss-owner at gpfsug.org > > When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Hello (Mcneil, Tony) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 15 Oct 2014 14:01:49 +0100 > From: "Mcneil, Tony" > To: "gpfsug-discuss at gpfsug.org" > Subject: [gpfsug-discuss] Hello > Message-ID: > > <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac.u > k> > > Content-Type: text/plain; charset="us-ascii" > > Hello All, > > Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' > > We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. > > The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. > > So far we have migrated all our students and approximately 60% of our staff. > > Looking forward to receiving some interesting posts from the forum. > > Regards > Tony. > > Tony McNeil > Senior Systems Support Analyst, Infrastructure, Information Services > ______________________________________________________________________ > ________ > > T Internal: 62852 > T 020 8417 2852 > > Kingston University London > Penrhyn Road, Kingston upon Thames KT1 2EE > www.kingston.ac.uk > > Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. > Please consider the environment before printing this email. > > > This email has been scanned for all viruses by the MessageLabs Email Security System. > -------------- next part -------------- An HTML attachment was > scrubbed... > URL: > bcf/attachment-0001.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 33, Issue 19 > ********************************************** > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > > This email has been scanned for all viruses by the MessageLabs Email > Security System. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ Message: 2 Date: Tue, 21 Oct 2014 10:53:37 +0000 From: Robert Triendl To: "chair at gpfsug.org" , gpfsug main discussion list Subject: Re: [gpfsug-discuss] Hello (Mcneil, Tony) Message-ID: Content-Type: text/plain; charset="Windows-1252" Yes, I think so? I am;-) On 2014/10/21, at 19:42, Jez Tucker (Chair) wrote: > I noticed that v7000 Unified is using CTDB v3.3. > What magic version is that as it's not in the git tree. Latest tagged is 2.5.4. > Is that a question for Amitay? > > On 17/10/14 06:25, Mcneil, Tony wrote: >> Hi Bill, >> >> Yes, we are, CTDB version: 2.5.2.0.54.gf9fbccb.devel >> >> Regards >> Tony. >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org >> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Pappas, Bill >> Sent: 16 October 2014 14:50 >> To: gpfsug-discuss at gpfsug.org >> Subject: [gpfsug-discuss] Hello (Mcneil, Tony) >> >> Are you using ctdb? >> >> Thanks, >> Bill Pappas - >> Manager - Enterprise Storage Group >> Sr. Enterprise Network Storage Architect Information Sciences >> Department / Enterprise Informatics Division St. Jude Children's >> Research Hospital >> 262 Danny Thomas Place, Mail Stop 504 Memphis, TN 38105 >> bill.pappas at stjude.org >> (901) 595-4549 office >> www.stjude.org >> Email disclaimer: http://www.stjude.org/emaildisclaimer >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at gpfsug.org >> [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of >> gpfsug-discuss-request at gpfsug.org >> Sent: Thursday, October 16, 2014 6:00 AM >> To: gpfsug-discuss at gpfsug.org >> Subject: gpfsug-discuss Digest, Vol 33, Issue 19 >> >> Send gpfsug-discuss mailing list submissions to >> gpfsug-discuss at gpfsug.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> or, via email, send a message with subject or body 'help' to >> gpfsug-discuss-request at gpfsug.org >> >> You can reach the person managing the list at >> gpfsug-discuss-owner at gpfsug.org >> >> When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." >> >> >> Today's Topics: >> >> 1. Hello (Mcneil, Tony) >> >> >> --------------------------------------------------------------------- >> - >> >> Message: 1 >> Date: Wed, 15 Oct 2014 14:01:49 +0100 >> From: "Mcneil, Tony" >> To: "gpfsug-discuss at gpfsug.org" >> Subject: [gpfsug-discuss] Hello >> Message-ID: >> >> <41FE47CD792F16439D1192AA1087D0E801923DBE6705 at KUMBX.kuds.kingston.ac. >> uk> >> >> Content-Type: text/plain; charset="us-ascii" >> >> Hello All, >> >> Thank you for admitting me to the discussion group, I work at Kingston University within the 'Hosting Services Team' >> >> We have recently implemented a 17 node GPFS 3.5.0.20 cluster that serves Samba 4.1.12, the Cluster is configured with 2 failure domains and is located across 2 physical campus's, with a quorum desc disk located at a 3rd site. >> >> The intention is to provide a single namespace for all university unstructured data, the cluster is joined to our AD environment and provisions the home directories for all staff and students (45000+) accounts and will be the repository for all shared data. >> >> So far we have migrated all our students and approximately 60% of our staff. >> >> Looking forward to receiving some interesting posts from the forum. >> >> Regards >> Tony. >> >> Tony McNeil >> Senior Systems Support Analyst, Infrastructure, Information Services >> _____________________________________________________________________ >> _________ >> >> T Internal: 62852 >> T 020 8417 2852 >> >> Kingston University London >> Penrhyn Road, Kingston upon Thames KT1 2EE >> www.kingston.ac.uk >> >> Information in this email and any attachments are confidential, and may not be copied or used by anyone other than the addressee, nor disclosed to any third party without our permission. There is no intention to create any legally binding contract or other commitment through the use of this email. >> Please consider the environment before printing this email. >> >> >> This email has been scanned for all viruses by the MessageLabs Email Security System. >> -------------- next part -------------- An HTML attachment was >> scrubbed... >> URL: >> > 8bcf/attachment-0001.html> >> >> ------------------------------ >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> End of gpfsug-discuss Digest, Vol 33, Issue 19 >> ********************************************** >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> >> This email has been scanned for all viruses by the MessageLabs Email >> Security System. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at gpfsug.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 33, Issue 21 ********************************************** From bbanister at jumptrading.com Thu Oct 23 19:35:45 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Oct 2014 18:35:45 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> I reviewed my RFE request again and notice that it has been marked as ?Private? and I think this is preventing people from voting on this RFE. I have talked to others that would like to vote for this RFE. How can I set the RFE to public so that others may vote on it? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister Sent: Friday, October 10, 2014 12:13 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted. I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable. I?m sure we would all prefer something that is supported directly by IBM (hence the RFE!) Thanks, -Bryan Ps. Hajo said that he couldn?t access the RFE to vote on it: I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo Here is what the RFE website states: Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 A unique URL that you can bookmark and share with others. From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Friday, October 10, 2014 11:52 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > wrote: I agree with Ben, I think. I don?t want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Oct 23 19:50:21 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Oct 2014 18:50:21 +0000 Subject: [gpfsug-discuss] GPFS User Group at SC14 Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C68@CHI-EXCHANGEW2.w2k.jumptrading.com> I'm going to be attending the GPFS User Group at SC14 this year. Here is basic agenda that was provided: GPFS/Elastic Storage User Group Monday, November 17, 2014 3:00 PM-5:00 PM: GPFS/Elastic Storage User Group [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] IBM Software Defined Storage strategy update [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Customer presentations [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Future directions such as object storage and OpenStack integration [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Elastic Storage server update [http://www.ibm.com/marketing/campaigns/US-C24508EE/bullet.gif] Elastic Storage roadmap (*NDA required) 5:00 PM: Reception Conference room location provided upon registration. *Attendees must sign a non-disclosure agreement upon arrival or as provided in advance. I think it would be great to review the submitted RFEs and give the user group the chance to vote on them to help promote the RFEs that we care about most. I would also really appreciate any additional details regarding the new GPFS 4.1 deadlock detection facility and any recommended best practices around this new feature. Thanks! -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 76 bytes Desc: image001.gif URL: From chair at gpfsug.org Thu Oct 23 19:52:07 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Thu, 23 Oct 2014 19:52:07 +0100 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <54494E57.90304@gpfsug.org> Hi Bryan Unsure, to be honest. When I added all the GPFS UG RFEs in, I didn't see an option to make the RFE private. There's private fields, but not a 'make this RFE private' checkbox or such. This one may be better directed to the GPFS developer forum / redo the RFE. RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll be feeding info back to the group. Jez On 23/10/14 19:35, Bryan Banister wrote: > > I reviewed my RFE request again and notice that it has been marked as > ?Private? and I think this is preventing people from voting on this > RFE. I have talked to others that would like to vote for this RFE. > > How can I set the RFE to public so that others may vote on it? > > Thanks! > > -Bryan > > *From:*gpfsug-discuss-bounces at gpfsug.org > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Bryan Banister > *Sent:* Friday, October 10, 2014 12:13 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > A DMAPI daemon solution puts a dependency on the DMAPI daemon for the > file system to be mounted. I think it would be better to have > something like what I requested in the RFE that would hopefully not > have this dependency, and would be optional/configurable. I?m sure we > would all prefer something that is supported directly by IBM (hence > the RFE!) > > Thanks, > > -Bryan > > Ps. Hajo said that he couldn?t access the RFE to vote on it: > > I would like to support the RFE but i get: > > "You cannot access this page because you do not have the proper > authority." > > Cheers > > Hajo > > Here is what the RFE website states: > > Bookmarkable > URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 > A unique URL that you can bookmark and share with others. > > *From:*gpfsug-discuss-bounces at gpfsug.org > > [mailto:gpfsug-discuss-bounces at gpfsug.org] *On Behalf Of *Sven Oehme > *Sent:* Friday, October 10, 2014 11:52 AM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > The only DMAPI agent i am aware of is a prototype that was written by > tridge in 2008 to demonstrate a file based HSM system for GPFS. > > its a working prototype, at least it worked in 2008 :-) > > you can get the source code from git : > > http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary > > just to be clear, there is no Support for this code. we obviously > Support the DMAPI interface , but the code that exposes the API is > nothing we provide Support for. > > thx. Sven > > On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > > wrote: > > I agree with Ben, I think. > > I don?t want to use the ILM policy engine as that puts a direct > workload against the metadata storage and server resources. We need > something out-of-band, out of the file system operational path. > > Is there a simple DMAPI daemon that would log the file system > namespace changes that we could use? > > If so are there any limitations? > > And is it possible to set this up in an HA environment? > > Thanks! > > -Bryan > > *From:*gpfsug-discuss-bounces at gpfsug.org > > [mailto:gpfsug-discuss-bounces at gpfsug.org > ] *On Behalf Of *Ben De Luca > *Sent:* Friday, October 10, 2014 11:10 AM > > > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] GPFS RFE promotion > > querying this through the policy engine is far to late to do any thing > useful with it > > On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: > > Ben, > > to get lists of 'Hot Files' turn File Heat on , some discussion about > it is here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 > > thx. Sven > > On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: > > Id like this to see hot files > > On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > > wrote: > > Hmm... I didn't think to use the DMAPI interface. That could be a > nice option. Has anybody done this already and are there any examples > we could look at? > > Thanks! > -Bryan > > > -----Original Message----- > From: gpfsug-discuss-bounces at gpfsug.org > > [mailto:gpfsug-discuss-bounces at gpfsug.org > ] On Behalf Of Phil Pishioneri > Sent: Friday, October 10, 2014 10:04 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS RFE promotion > > On 10/9/14 3:31 PM, Bryan Banister wrote: > > > > Just wanted to pass my GPFS RFE along: > > > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > > 0458 > > > > > > *Description*: > > > > GPFS File System Manager should provide the option to log all file and > > directory operations that occur in a file system, preferably stored in > > a TSD (Time Series Database) that could be quickly queried through an > > API interface and command line tools. ... > > > > The rudimentaries for this already exist via the DMAPI interface in > GPFS (used by the TSM HSM product). A while ago this was posted to the > IBM GPFS DeveloperWorks forum: > > On 1/3/11 10:27 AM, dWForums wrote: > > Author: > > AlokK.Dhir > > > > Message: > > We have a proof of concept which uses DMAPI to listens to and > passively logs filesystem changes with a non blocking listener. This > log can be used to generate backup sets etc. Unfortunately, a bug in > the current DMAPI keeps this approach from working in the case of > certain events. I am told 3.4.0.3 may contain a fix. We will gladly > share the code once it is working. > > -Phil > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ------------------------------------------------------------------------ > > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ------------------------------------------------------------------------ > > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > > ------------------------------------------------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you are hereby > notified that any review, dissemination or copying of this email is > strictly prohibited, and to please notify the sender immediately and > destroy this email and any attachments. Email transmission cannot be > guaranteed to be secure or error-free. The Company, therefore, does > not make any guarantees as to the completeness or accuracy of this > email or any attachments. This email is for informational purposes > only and does not constitute a recommendation, offer, request or > solicitation of any kind to buy, sell, subscribe, redeem or perform > any type of transaction of a financial product. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Oct 23 19:59:52 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Oct 2014 18:59:52 +0000 Subject: [gpfsug-discuss] GPFS RFE promotion In-Reply-To: <54494E57.90304@gpfsug.org> References: <21BC488F0AEA2245B2C3E83FC0B33DBB8D617A@CHI-EXCHANGEW2.w2k.jumptrading.com> <5437F562.1080609@psu.edu> <21BC488F0AEA2245B2C3E83FC0B33DBB8D9EF1@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA20E@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8DA539@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB947BAF@CHI-EXCHANGEW2.w2k.jumptrading.com> <54494E57.90304@gpfsug.org> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB947C98@CHI-EXCHANGEW2.w2k.jumptrading.com> Looks like IBM decides if the RFE is public or private: Q: What are private requests? A: Private requests are requests that can be viewed only by IBM, the request author, members of a group with the request in its watchlist, and users with the request in their watchlist. Only the author of the request can add a private request to their watchlist or a group watchlist. Private requests appear in various public views, such as Top 20 watched or Planned requests; however, only limited information about the request will be displayed. IBM determines the default request visibility of a request, either public or private, and IBM may change the request visibility at any time. If you are watching a request and have subscribed to email notifications, you will be notified if the visibility of the request changes. I'm submitting a request to make the RFE public so that others may vote on it now, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jez Tucker (Chair) Sent: Thursday, October 23, 2014 1:52 PM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] GPFS RFE promotion Hi Bryan Unsure, to be honest. When I added all the GPFS UG RFEs in, I didn't see an option to make the RFE private. There's private fields, but not a 'make this RFE private' checkbox or such. This one may be better directed to the GPFS developer forum / redo the RFE. RE: GPFS UG RFEs, GPFS devs will be updating those imminently and we'll be feeding info back to the group. Jez On 23/10/14 19:35, Bryan Banister wrote: I reviewed my RFE request again and notice that it has been marked as "Private" and I think this is preventing people from voting on this RFE. I have talked to others that would like to vote for this RFE. How can I set the RFE to public so that others may vote on it? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Bryan Banister Sent: Friday, October 10, 2014 12:13 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion A DMAPI daemon solution puts a dependency on the DMAPI daemon for the file system to be mounted. I think it would be better to have something like what I requested in the RFE that would hopefully not have this dependency, and would be optional/configurable. I'm sure we would all prefer something that is supported directly by IBM (hence the RFE!) Thanks, -Bryan Ps. Hajo said that he couldn't access the RFE to vote on it: I would like to support the RFE but i get: "You cannot access this page because you do not have the proper authority." Cheers Hajo Here is what the RFE website states: Bookmarkable URL:http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=60458 A unique URL that you can bookmark and share with others. From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Friday, October 10, 2014 11:52 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion The only DMAPI agent i am aware of is a prototype that was written by tridge in 2008 to demonstrate a file based HSM system for GPFS. its a working prototype, at least it worked in 2008 :-) you can get the source code from git : http://git.samba.org/rsync.git/?p=tridge/hacksm.git;a=summary just to be clear, there is no Support for this code. we obviously Support the DMAPI interface , but the code that exposes the API is nothing we provide Support for. thx. Sven On Fri, Oct 10, 2014 at 9:15 AM, Bryan Banister > wrote: I agree with Ben, I think. I don't want to use the ILM policy engine as that puts a direct workload against the metadata storage and server resources. We need something out-of-band, out of the file system operational path. Is there a simple DMAPI daemon that would log the file system namespace changes that we could use? If so are there any limitations? And is it possible to set this up in an HA environment? Thanks! -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ben De Luca Sent: Friday, October 10, 2014 11:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion querying this through the policy engine is far to late to do any thing useful with it On Fri, Oct 10, 2014 at 11:51 PM, Sven Oehme > wrote: Ben, to get lists of 'Hot Files' turn File Heat on , some discussion about it is here : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014920653 thx. Sven On Fri, Oct 10, 2014 at 8:26 AM, Ben De Luca > wrote: Id like this to see hot files On Fri, Oct 10, 2014 at 11:08 PM, Bryan Banister > wrote: Hmm... I didn't think to use the DMAPI interface. That could be a nice option. Has anybody done this already and are there any examples we could look at? Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Phil Pishioneri Sent: Friday, October 10, 2014 10:04 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS RFE promotion On 10/9/14 3:31 PM, Bryan Banister wrote: > > Just wanted to pass my GPFS RFE along: > > http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=6 > 0458 > > > *Description*: > > GPFS File System Manager should provide the option to log all file and > directory operations that occur in a file system, preferably stored in > a TSD (Time Series Database) that could be quickly queried through an > API interface and command line tools. ... > The rudimentaries for this already exist via the DMAPI interface in GPFS (used by the TSM HSM product). A while ago this was posted to the IBM GPFS DeveloperWorks forum: On 1/3/11 10:27 AM, dWForums wrote: > Author: > AlokK.Dhir > > Message: > We have a proof of concept which uses DMAPI to listens to and passively logs filesystem changes with a non blocking listener. This log can be used to generate backup sets etc. Unfortunately, a bug in the current DMAPI keeps this approach from working in the case of certain events. I am told 3.4.0.3 may contain a fix. We will gladly share the code once it is working. -Phil _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 24 19:58:07 2014 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 24 Oct 2014 18:58:07 +0000 Subject: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB8CCC32@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CCF14@CHI-EXCHANGEW2.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB8CEF61@CHI-EXCHANGEW2.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB94C513@CHI-EXCHANGEW2.w2k.jumptrading.com> It is with humble apology and great relief that I was wrong about the AFM limitation that I believed existed in the configuration I explained below. The problem that I had with my configuration is that the NSD client cluster was not completely updated to GPFS 4.1.0-3, as there are a few nodes still running 3.5.0-20 in the cluster which currently prevents upgrading the GPFS file system release version (e.g. mmchconfig release=LATEST) to 4.1.0-3. This GPFS configuration ?requirement? isn?t documented in the Advanced Admin Guide, but it makes sense that this is required since only the GPFS 4.1 release supports the GPFS protocol for AFM fileset targets. I have tested the configuration with a new NSD Client cluster and the configuration works as desired. Thanks Kalyan and others for their feedback. Our file system namespace is unfortunately filled with small files that do not allow AFM to parallelize the data transfers across multiple nodes. And unfortunately AFM will only allow one Gateway node per fileset to perform the prefetch namespace scan operation, which is incredibly slow as I stated before. We were only seeing roughly 100 x " Queue numExec" operations per second. I think this performance is gated by the directory namespace scan of the single gateway node. Thanks! -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 10:21 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations some clarifications inline: Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister > To: gpfsug main discussion list > Date: 10/07/2014 08:12 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org Interesting that AFM is supposed to work in a multi-cluster environment. We were using GPFS on the backend. The new GPFS file system was AFM linked over GPFS protocol to the old GPFS file system using the standard multi-cluster mount. The "gateway" nodes in the new cluster mounted the old file system. All systems were connected over the same QDR IB fabric. The client compute nodes in the third cluster mounted both the old and new file systems. I looked for waiters on the client and NSD servers of the new file system when the problem occurred, but none existed. I tried stracing the `ls` process, but it reported nothing and the strace itself become unkillable. There were no error messages in any GPFS or system logs related to the `ls` fail. NFS clients accessing cNFS servers in the new cluster also worked as expected. The `ls` from the NFS client in an AFM fileset returned the expected directory listing. Thus all symptoms indicated the configuration wasn't supported. I may try to replicate the problem in a test environment at some point. However AFM isn't really a great solution for file data migration between file systems for these reasons: 1) It requires the complicated AFM setup, which requires manual operations to sync data between the file systems (e.g. mmapplypolicy run on old file system to get file list THEN mmafmctl prefetch operation on the new AFM fileset to pull data). No way to have it simply keep the two namespaces in sync. And you must be careful with the "Local Update" configuration not to modify basically ANY file attributes in the new AFM fileset until a CLEAN cutover of your application is performed, otherwise AFM will remove the link of the file to data stored on the old file system. This is concerning and it is not easy to detect that this event has occurred. --> The LU mode is meant for scenarios where changes in cache are not --> meant to be pushed back to old filesystem. If thats not whats desired then other AFM modes like IW can be used to keep namespace in sync and data can flow from both sides. Typically, for data migration --metadata-only to pull in the full namespace first and data can be migrated on demand or via policy as outlined above using prefetch cmd. AFM setup should be extension to GPFS multi-cluster setup when using GPFS backend. 2) The "Progressive migration with no downtime" directions actually states that there is downtime required to move applications to the new cluster, THUS DOWNTIME! And it really requires a SECOND downtime to finally disable AFM on the file set so that there is no longer a connection to the old file system, THUS TWO DOWNTIMES! --> I am not sure I follow the first downtime. If applications have to start using the new filesystem, then they have to be informed accordingly. If this can be done without bringing down applications, then there is no DOWNTIME. Regarding, second downtime, you are right, disabling AFM after data migration requires unlink and hence downtime. But there is a easy workaround, where revalidation intervals can be increased to max or GW nodes can be unconfigured without downtime with same effect. And disabling AFM can be done at a later point during maintenance window. We plan to modify this to have this done online aka without requiring unlink of the fileset. This will get prioritized if there is enough interest in AFM being used in this direction. 3) The prefetch operation can only run on a single node thus is not able to take any advantage of the large number of NSD servers supporting both file systems for the data migration. Multiple threads from a single node just doesn't cut it due to single node bandwidth limits. When I was running the prefetch it was only executing roughly 100 " Queue numExec" operations per second. The prefetch operation for a directory with 12 Million files was going to take over 33 HOURS just to process the file list! --> Prefetch can run on multiple nodes by configuring multiple GW nodes --> and enabling parallel i/o as specified in the docs..link provided below. Infact it can parallelize data xfer to a single file and also do multiple files in parallel depending on filesizes and various tuning params. 4) In comparison, parallel rsync operations will require only ONE downtime to run a final sync over MULTIPLE nodes in parallel at the time that applications are migrated between file systems and does not require the complicated AFM configuration. Yes, there is of course efforts to breakup the namespace for each rsync operations. This is really what AFM should be doing for us... chopping up the namespace intelligently and spawning prefetch operations across multiple nodes in a configurable way to ensure performance is met or limiting overall impact of the operation if desired. --> AFM can be used for data migration without any downtime dictated by --> AFM (see above) and it can infact use multiple threads on multiple nodes to do parallel i/o. AFM, however, is great for what it is intended to be, a cached data access mechanism across a WAN. Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Kalyan Gunda Sent: Tuesday, October 07, 2014 12:03 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, AFM supports GPFS multi-cluster..and we have customers already using this successfully. Are you using GPFS backend? Can you explain your configuration in detail and if ls is hung it would have generated some long waiters. Maybe this should be pursued separately via PMR. You can ping me the details directly if needed along with opening a PMR per IBM service process. As for as prefetch is concerned, right now its limited to one prefetch job per fileset. Each job in itself is multi-threaded and can use multi-nodes to pull in data based on configuration. "afmNumFlushThreads" tunable controls the number of threads used by AFM. This parameter can be changed via mmchfileset cmd (mmchfileset pubs doesn't show this param for some reason, I will have that updated.) eg: mmchfileset fs1 prefetchIW -p afmnumflushthreads=5 Fileset prefetchIW changed. List the change: mmlsfileset fs1 prefetchIW --afm -L Filesets in file system 'fs1': Attributes for fileset prefetchIW: =================================== Status Linked Path /gpfs/fs1/prefetchIW Id 36 afm-associated Yes Target nfs://hs21n24/gpfs/fs1/singleTargetToUseForPrefetch Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 5 Prefetch Threshold 0 (default) Eviction Enabled yes (default) AFM parallel i/o can be setup such that multiple GW nodes can be used to pull in data..more details are available here http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmparallelio.htm and this link outlines tuning params for parallel i/o along with others: http://www-01.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs200.doc/bl1adv_afmtuning.htm%23afmtuning Regards Kalyan GPFS Development EGL D Block, Bangalore From: Bryan Banister > To: gpfsug main discussion list > Date: 10/06/2014 09:57 PM Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Sent by: gpfsug-discuss-bounces at gpfsug.org We are using 4.1.0.3 on the cluster with the AFM filesets, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Monday, October 06, 2014 11:28 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] AFM limitations in a multi-cluster environment, slow prefetch operations Hi Bryan, in 4.1 AFM uses multiple threads for reading data, this was different in 3.5 . what version are you using ? thx. Sven On Mon, Oct 6, 2014 at 8:36 AM, Bryan Banister > wrote: Just an FYI to the GPFS user community, We have been testing out GPFS AFM file systems in our required process of file data migration between two GPFS file systems. The two GPFS file systems are managed in two separate GPFS clusters. We have a third GPFS cluster for compute systems. We created new independent AFM filesets in the new GPFS file system that are linked to directories in the old file system. Unfortunately access to the AFM filesets from the compute cluster completely hang. Access to the other parts of the second file system is fine. This limitation/issue is not documented in the Advanced Admin Guide. Further, we performed prefetch operations using a file mmafmctl command, but the process appears to be single threaded and the operation was extremely slow as a result. According to the Advanced Admin Guide, it is not possible to run multiple prefetch jobs on the same fileset: GPFS can prefetch the data using the mmafmctl Device prefetch ?j FilesetName command (which specifies a list of files to prefetch). Note the following about prefetching: v It can be run in parallel on multiple filesets (although more than one prefetching job cannot be run in parallel on a single fileset). We were able to quickly create the ?--home-inode-file? from the old file system using the mmapplypolicy command as the documentation describes. However the AFM prefetch operation is so slow that we are better off running parallel rsync operations between the file systems versus using the GPFS AFM prefetch operation. Cheers, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Wed Oct 29 13:59:40 2014 From: chair at gpfsug.org (Jez Tucker (Chair)) Date: Wed, 29 Oct 2014 13:59:40 +0000 Subject: [gpfsug-discuss] Storagebeers, Nov 13th Message-ID: <5450F2CC.3070302@gpfsug.org> Hello all, I just thought I'd make you all aware of a social, #storagebeers on Nov 13th organised by Martin Glassborow, one of our UG members. http://www.gpfsug.org/2014/10/29/storagebeers-13th-nov/ I'll be popping along. Hopefully see you there. Jez From Jared.Baker at uwyo.edu Wed Oct 29 15:31:31 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 15:31:31 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings Message-ID: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 16:33:22 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 16:33:22 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <1414600402.24518.216.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-10-29 at 15:31 +0000, Jared David Baker wrote: [SNIP] > I?m wondering if somebody has seen this type of issue before? Will > recreating my NSDs destroy the filesystem? I?m thinking that all the > data is intact, but there is no crucial data on this file system yet, > so I could recreate the file system, but I would like to learn how to > solve a problem like this. Thanks for all help and information. > At an educated guess and assuming the disks are visible to the OS (try dd'ing the first few GB to /dev/null) it looks like you have managed at some point to wipe the NSD descriptors from the disks - ouch. The file system will continue to work after this has been done, but if you start rebooting the NSD servers you will find after the last one has been restarted the file system is unmountable. Simply unmounting the file systems from each NDS server is also probably enough. For good measure unless you have a backup of the NSD descriptors somewhere it is also an unrecoverable condition. Lucky for you if there is nothing on it that matters. My suggestion is re-examine what you did during the firmware upgrade, as that is the most likely culprit. However bear in mind that it could have been days or even weeks ago that it occurred. I would raise a PMR to be sure, but it looks to me like you will be recreating the file system from scratch. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From oehmes at gmail.com Wed Oct 29 16:42:26 2014 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 29 Oct 2014 09:42:26 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Hello, there are multiple reasons why the descriptors can not be found . there was a recent change in firmware behaviors on multiple servers that restore the GPT table from a disk if the disk was used as a OS disk before used as GPFS disks. some infos here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e if thats the case there is a procedure to restore them. it could also be something very trivial , e.g. that your multipath mapping changed and your nsddevice file actually just prints out devices instead of scanning them and create a list on the fly , so GPFS ignores the new path to the disks. in any case , opening a PMR and work with Support is the best thing to do before causing any more damage. if the file-system is still mounted don't unmount it under any circumstances as Support needs to extract NSD descriptor information from it to restore them easily. Sven On Wed, Oct 29, 2014 at 8:31 AM, Jared David Baker wrote: > Hello all, > > > > I?m hoping that somebody can shed some light on a problem that I > experienced yesterday. I?ve been working with GPFS for a couple months as > an admin now, but I?ve come across a problem that I?m unable to see the > answer to. Hopefully the solution is not listed somewhere blatantly on the > web, but I spent a fair amount of time looking last night. Here is the > situation: yesterday, I needed to update some firmware on a Mellanox HCA > FDR14 card and reboot one of our GPFS servers and repeat for the sister > node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, > upon reboot, the server seemed to lose the path mappings to the multipath > devices for the NSDs. Output below: > > > > -- > > [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch > > > > Disk name NSD volume ID Device Node name > Remarks > > > --------------------------------------------------------------------------------------- > > dcs3800u31a_lun0 0A62001B54235577 - > mminsd5.infini (not found) server node > > dcs3800u31a_lun0 0A62001B54235577 - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - > mminsd5.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - > mminsd6.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - > mminsd6.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - > mminsd5.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini > (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - > mminsd6.infini (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - > mminsd5.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - > mminsd6.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - > mminsd6.infini (not found) server node > > > > [root at mmmnsd5 ~]# > > -- > > > > Also, the system was working fantastically before the reboot, but now I?m > unable to mount the GPFS filesystem. The disk names look like they are > there and mapped to the NSD volume ID, but there is no Device. I?ve created > the /var/mmfs/etc/nsddevices script and it has the following output with > user return 0: > > > > -- > > [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices > > mapper/dcs3800u31a_lun0 dmm > > mapper/dcs3800u31a_lun10 dmm > > mapper/dcs3800u31a_lun2 dmm > > mapper/dcs3800u31a_lun4 dmm > > mapper/dcs3800u31a_lun6 dmm > > mapper/dcs3800u31a_lun8 dmm > > mapper/dcs3800u31b_lun1 dmm > > mapper/dcs3800u31b_lun11 dmm > > mapper/dcs3800u31b_lun3 dmm > > mapper/dcs3800u31b_lun5 dmm > > mapper/dcs3800u31b_lun7 dmm > > mapper/dcs3800u31b_lun9 dmm > > [root at mmmnsd5 ~]# > > -- > > > > That output looks correct to me based on the documentation. So I went > digging in the GPFS log file and found this relevant information: > > > > -- > > Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No > such NSD locally found. > > Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No > such NSD locally found. > > Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No > such NSD locally found. > > Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. > No such NSD locally found. > > Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. > No such NSD locally found. > > -- > > > > Okay, so the NSDs don?t seem to be able to be found, so I attempt to > rediscover the NSD by executing the command mmnsddiscover: > > > > -- > > [root at mmmnsd5 ~]# mmnsddiscover > > mmnsddiscover: Attempting to rediscover the disks. This may take a while > ... > > mmnsddiscover: Finished. > > [root at mmmnsd5 ~]# > > -- > > > > I was hoping that finished, but then upon restarting GPFS, there was no > success. Verifying with mmlsnsd -X -f gscratch > > > > -- > > [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch > > > > Disk name NSD volume ID Device Devtype Node > name Remarks > > > --------------------------------------------------------------------------------------------------- > > dcs3800u31a_lun0 0A62001B54235577 - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun0 0A62001B54235577 - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun10 0A62001C542355AA - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun2 0A62001C54235581 - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun4 0A62001B5423558B - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - - > mminsd6.infini (not found) server node > > dcs3800u31a_lun6 0A62001C54235595 - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - - > mminsd5.infini (not found) server node > > dcs3800u31a_lun8 0A62001B5423559F - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun1 0A62001B5423557C - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun11 0A62001C542355AF - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun3 0A62001C54235586 - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun5 0A62001B54235590 - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - - > mminsd6.infini (not found) server node > > dcs3800u31b_lun7 0A62001C5423559A - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - - > mminsd5.infini (not found) server node > > dcs3800u31b_lun9 0A62001B542355A4 - - > mminsd6.infini (not found) server node > > > > [root at mmmnsd5 ~]# > > -- > > > > I?m wondering if somebody has seen this type of issue before? Will > recreating my NSDs destroy the filesystem? I?m thinking that all the data > is intact, but there is no crucial data on this file system yet, so I could > recreate the file system, but I would like to learn how to solve a problem > like this. Thanks for all help and information. > > > > Regards, > > > > Jared > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oester at gmail.com Wed Oct 29 16:46:35 2014 From: oester at gmail.com (Bob Oesterlin) Date: Wed, 29 Oct 2014 11:46:35 -0500 Subject: [gpfsug-discuss] GPFS 4.1 event "deadlockOverload" Message-ID: I posted this to developerworks, but haven't seen a response. This is NOT the same event "deadlockDetected" that is documented in the 4.1 Probelm Determination Guide. I see these errors -in my mmfslog on the cluster master. I just upgraded to 4.1, and I can't find this documented anywhere. What is "event deadlockOverload" ? And what script would it call? The nodes in question are part of a CNFS group. Mon Oct 27 10:11:08.848 2014: [I] Received overload notification request from 10.30.42.30 to forward to all nodes in cluster XXX Mon Oct 27 10:11:08.849 2014: [I] Calling User Exit Script gpfsNotifyOverload: event deadlockOverload, Async command /usr/lpp/mmfs/bin/mmcommon. Mon Oct 27 10:11:14.478 2014: [I] Received overload notification request from 10.30.42.26 to forward to all nodes in cluster XXX Mon Oct 27 10:11:58.869 2014: [I] Received overload notification request from 10.30.42.30 to forward to all nodes in cluster XXX Mon Oct 27 10:11:58.870 2014: [I] Calling User Exit Script gpfsNotifyOverload: event deadlockOverload, Async command /usr/lpp/mmfs/bin/mmcommon. Bob Oesterlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 17:19:14 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 17:19:14 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > Hello, > > > there are multiple reasons why the descriptors can not be found . > > > there was a recent change in firmware behaviors on multiple servers > that restore the GPT table from a disk if the disk was used as a OS > disk before used as GPFS disks. some infos > here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > if thats the case there is a procedure to restore them. I have been categorically told by IBM in no uncertain terms if the NSD descriptors have *ALL* been wiped then it is game over for that file system; restore from backup is your only option. If the GPT table has been "restored" and overwritten the NSD descriptors then you are hosed. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From oehmes at gmail.com Wed Oct 29 17:22:30 2014 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 29 Oct 2014 10:22:30 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> Message-ID: if you still have a running system you can extract the information and recreate the descriptors. if your sytem is already down, this is not possible any more. which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help . Sven On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard wrote: > On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > > Hello, > > > > > > there are multiple reasons why the descriptors can not be found . > > > > > > there was a recent change in firmware behaviors on multiple servers > > that restore the GPT table from a disk if the disk was used as a OS > > disk before used as GPFS disks. some infos > > here : > https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > > > > if thats the case there is a procedure to restore them. > > I have been categorically told by IBM in no uncertain terms if the NSD > descriptors have *ALL* been wiped then it is game over for that file > system; restore from backup is your only option. > > If the GPT table has been "restored" and overwritten the NSD descriptors > then you are hosed. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 17:29:09 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 17:29:09 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> Message-ID: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk> On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote: > if you still have a running system you can extract the information and > recreate the descriptors. We had a running system with the file system still mounted on some nodes but all the NSD descriptors wiped, and I repeat where categorically told by IBM that nothing could be done and to restore the file system from backup. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Jared.Baker at uwyo.edu Wed Oct 29 17:30:00 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 17:30:00 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> Message-ID: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com> Thanks for all the information. I?m not exactly sure what happened during the firmware update of the HCAs (another admin). But I do have all the stanza files that I used to create the NSDs. Possible to utilize them to just regenerate the NSDs or is it consensus that the FS is gone? As the system was not in production (yet) I?ve got no problem delaying the release and running some tests to verify possible fixes. The system was already unmounted, so it is a completely inactive FS across the cluster. Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 11:23 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings if you still have a running system you can extract the information and recreate the descriptors. if your sytem is already down, this is not possible any more. which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help . Sven On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard > wrote: On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > Hello, > > > there are multiple reasons why the descriptors can not be found . > > > there was a recent change in firmware behaviors on multiple servers > that restore the GPT table from a disk if the disk was used as a OS > disk before used as GPFS disks. some infos > here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > if thats the case there is a procedure to restore them. I have been categorically told by IBM in no uncertain terms if the NSD descriptors have *ALL* been wiped then it is game over for that file system; restore from backup is your only option. If the GPT table has been "restored" and overwritten the NSD descriptors then you are hosed. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Oct 29 17:45:38 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 10:45:38 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> <4066dca761054739bf4c077158fbb37a@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Jared, if time permits i would open a PMR to check what happened. as i stated in my first email it could be multiple things, the GPT restore is only one possible of many explanations and some more simple reasons could explain what you see as well. get somebody from support check the state and then we know for sure. it would give you also peace of mind that it doesn't happen again when you are in production. if you feel its not worth and you don't wipe any important information start over again. btw. the newer BIOS versions of IBM servers have a option from preventing the GPT issue from happening : [root at gss02n1 ~]# asu64 showvalues DiskGPTRecovery.DiskGPTRecovery IBM Advanced Settings Utility version 9.61.85B Licensed Materials - Property of IBM (C) Copyright IBM Corp. 2007-2014 All Rights Reserved IMM LAN-over-USB device 0 enabled successfully. Successfully discovered the IMM via SLP. Discovered IMM at IP address 169.254.95.118 Connected to IMM at IP address 169.254.95.118 DiskGPTRecovery.DiskGPTRecovery=None= if you set it the GPT will never get restored. you would have to set this on all the nodes that have access to the disks. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 10:30 AM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks for all the information. I?m not exactly sure what happened during the firmware update of the HCAs (another admin). But I do have all the stanza files that I used to create the NSDs. Possible to utilize them to just regenerate the NSDs or is it consensus that the FS is gone? As the system was not in production (yet) I?ve got no problem delaying the release and running some tests to verify possible fixes. The system was already unmounted, so it is a completely inactive FS across the cluster. Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 11:23 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings if you still have a running system you can extract the information and recreate the descriptors. if your sytem is already down, this is not possible any more. which is why i suggested to open a PMR as the Support team will be able to provide the right guidance and help . Sven On Wed, Oct 29, 2014 at 10:19 AM, Jonathan Buzzard wrote: On Wed, 2014-10-29 at 09:42 -0700, Sven Oehme wrote: > Hello, > > > there are multiple reasons why the descriptors can not be found . > > > there was a recent change in firmware behaviors on multiple servers > that restore the GPT table from a disk if the disk was used as a OS > disk before used as GPFS disks. some infos > here : https://www.ibm.com/developerworks/community/forums/html/topic?id=27f98aab-aa41-41f4-b6b7-c87d3ce87b9e > > > if thats the case there is a procedure to restore them. I have been categorically told by IBM in no uncertain terms if the NSD descriptors have *ALL* been wiped then it is game over for that file system; restore from backup is your only option. If the GPT table has been "restored" and overwritten the NSD descriptors then you are hosed. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Wed Oct 29 18:57:28 2014 From: ewahl at osc.edu (Ed Wahl) Date: Wed, 29 Oct 2014 18:57:28 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <1414603154.24518.225.camel@buzzard.phy.strath.ac.uk> , <1414603749.24518.227.camel@buzzard.phy.strath.ac.uk> Message-ID: SOBAR is your friend at that point? Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jonathan Buzzard [jonathan at buzzard.me.uk] Sent: Wednesday, October 29, 2014 1:29 PM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Server lost NSD mappings On Wed, 2014-10-29 at 10:22 -0700, Sven Oehme wrote: > if you still have a running system you can extract the information and > recreate the descriptors. We had a running system with the file system still mounted on some nodes but all the NSD descriptors wiped, and I repeat where categorically told by IBM that nothing could be done and to restore the file system from backup. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ewahl at osc.edu Wed Oct 29 19:07:34 2014 From: ewahl at osc.edu (Ed Wahl) Date: Wed, 29 Oct 2014 19:07:34 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I?m hoping that somebody can shed some light on a problem that I experienced yesterday. I?ve been working with GPFS for a couple months as an admin now, but I?ve come across a problem that I?m unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I?m unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I?ve created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don?t seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I?m wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I?m thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared From Jared.Baker at uwyo.edu Wed Oct 29 19:27:26 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 19:27:26 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at us.ibm.com Wed Oct 29 19:41:22 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 12:41:22 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jared.Baker at uwyo.edu Wed Oct 29 19:46:23 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 19:46:23 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Oct 29 20:02:53 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 13:02:53 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jared.Baker at uwyo.edu Wed Oct 29 20:13:06 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 20:13:06 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Apologies Sven, w/o comments below: -- #!/bin/ksh CONTROLLER_REGEX='[ab]_lun[0-9]+' for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX ) do echo mapper/$dev dmm #echo mapper/$dev generic done # Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover), return 0 -- Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at us.ibm.com Wed Oct 29 20:25:10 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Wed, 29 Oct 2014 13:25:10 -0700 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Hi, based on what i see is your BIOS or FW update wiped the NSD descriptor by restoring a GPT table on the start of a disk that shouldn't have a GPT table to begin with as its under control of GPFS. future releases of GPFS prevent this by writing our own GPT label to the disks so other tools don't touch them, but that doesn't help in your case any more. if you want this officially confirmed i would still open a PMR, but at that point given that you don't seem to have any production data on it from what i see in your response you should recreate the filesystem. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 01:13 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Apologies Sven, w/o comments below: -- #!/bin/ksh CONTROLLER_REGEX='[ab]_lun[0-9]+' for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX ) do echo mapper/$dev dmm #echo mapper/$dev generic done # Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover), return 0 -- Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker To: gpfsug main discussion list Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [ mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jared.Baker at uwyo.edu Wed Oct 29 20:30:29 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 20:30:29 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: Thanks Sven, I appreciate the feedback. I'll be opening the PMR soon. Again, thanks for the information. Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, based on what i see is your BIOS or FW update wiped the NSD descriptor by restoring a GPT table on the start of a disk that shouldn't have a GPT table to begin with as its under control of GPFS. future releases of GPFS prevent this by writing our own GPT label to the disks so other tools don't touch them, but that doesn't help in your case any more. if you want this officially confirmed i would still open a PMR, but at that point given that you don't seem to have any production data on it from what i see in your response you should recreate the filesystem. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 01:13 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Apologies Sven, w/o comments below: -- #!/bin/ksh CONTROLLER_REGEX='[ab]_lun[0-9]+' for dev in $( /bin/ls /dev/mapper | egrep $CONTROLLER_REGEX ) do echo mapper/$dev dmm #echo mapper/$dev generic done # Bypass the GPFS disk discovery (/usr/lpp/mmfs/bin/mmdevdiscover), return 0 -- Best, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 2:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Hi, i was asking for the content, not the result :-) can you run cat /var/mmfs/etc/nsddevices the 2nd output confirms at least that there is no correct label on the disk, as it returns EFI on a GNR system you get a few more infos , but at least you should see the NSD descriptor string like i get on my system : [root at gss02n1 ~]# dd if=/dev/sdaa bs=1k count=32 | strings T7$V e2d2s08 NSD descriptor for /dev/sdde created by GPFS Thu Oct 9 16:48:27 2014 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.0186765 s, 1.8 MB/s while i still would like to see the nsddevices script i assume your NSD descriptor is wiped and without a lot of manual labor and at least a recent GPFS dump this is very hard if at all to recreate. ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:46 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Sven, output below: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 /]# dd if=/dev/dm-0 bs=1k count=32 | strings 32+0 records in 32+0 records out 32768 bytes (33 kB) copied, 0.000739083 s, 44.3 MB/s EFI PART system [root at mmmnsd5 /]# -- Thanks, Jared From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Sven Oehme Sent: Wednesday, October 29, 2014 1:41 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings can you please post the content of your nsddevices script ? also please run dd if=/dev/dm-0 bs=1k count=32 |strings and post the output thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: Jared David Baker > To: gpfsug main discussion list > Date: 10/29/2014 12:27 PM Subject: Re: [gpfsug-discuss] Server lost NSD mappings Sent by: gpfsug-discuss-bounces at gpfsug.org ________________________________ Thanks Ed, I can see the multipath devices inside the OS after reboot. The storage is all SAS attached. Two servers which can see the multipath LUNS for failover, then export the gpfs filesystem to the compute cluster. -- [root at mmmnsd5 ~]# multipath -l dcs3800u31a_lun8 (360080e500029600c000001e953cf8291) dm-4 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:8 sdi 8:128 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:8 sdu 65:64 active undef running dcs3800u31b_lun9 (360080e5000295c68000001c253cf8221) dm-9 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:9 sdv 65:80 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:9 sdj 8:144 active undef running dcs3800u31a_lun6 (360080e500029600c000001e653cf8210) dm-3 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:6 sdg 8:96 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:6 sds 65:32 active undef running mpathm (3600605b007ca57d01b1b8a7a1a107bdd) dm-12 IBM,ServeRAID M1115 size=558G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:2:0:0 sdy 65:128 active undef running dcs3800u31b_lun7 (360080e5000295c68000001bd53cf81a9) dm-8 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:7 sdt 65:48 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:7 sdh 8:112 active undef running dcs3800u31a_lun10 (360080e500029600c000001ec53cf8301) dm-5 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:10 sdk 8:160 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:10 sdw 65:96 active undef running dcs3800u31a_lun4 (360080e500029600c000001e353cf8189) dm-1 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:4 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:4 sdq 65:0 active undef running dcs3800u31b_lun5 (360080e5000295c68000001b853cf8125) dm-10 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:5 sdr 65:16 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:5 sdf 8:80 active undef running dcs3800u31a_lun2 (360080e500029600c000001e053cf80f9) dm-2 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:2 sdc 8:32 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:2 sdo 8:224 active undef running dcs3800u31b_lun11 (360080e5000295c68000001c753cf828e) dm-11 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:11 sdx 65:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:11 sdl 8:176 active undef running dcs3800u31b_lun3 (360080e5000295c68000001b353cf8097) dm-6 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:3 sdp 8:240 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:3 sdd 8:48 active undef running dcs3800u31a_lun0 (360080e500029600c000001da53cf7ec1) dm-0 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:0:0 sda 8:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:1:0 sdm 8:192 active undef running dcs3800u31b_lun1 (360080e5000295c68000001ac53cf7e8d) dm-7 IBM,1813 FAStT size=29T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=0 status=active | `- 0:0:1:1 sdn 8:208 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 0:0:0:1 sdb 8:16 active undef running [root at mmmnsd5 ~]# -- -- [root at mmmnsd5 ~]# cat /proc/partitions major minor #blocks name 8 48 31251951616 sdd 8 32 31251951616 sdc 8 80 31251951616 sdf 8 16 31251951616 sdb 8 128 31251951616 sdi 8 112 31251951616 sdh 8 96 31251951616 sdg 8 192 31251951616 sdm 8 240 31251951616 sdp 8 208 31251951616 sdn 8 144 31251951616 sdj 8 64 31251951616 sde 8 224 31251951616 sdo 8 160 31251951616 sdk 8 176 31251951616 sdl 65 0 31251951616 sdq 65 48 31251951616 sdt 65 16 31251951616 sdr 65 128 584960000 sdy 65 80 31251951616 sdv 65 96 31251951616 sdw 65 64 31251951616 sdu 65 112 31251951616 sdx 65 32 31251951616 sds 8 0 31251951616 sda 253 0 31251951616 dm-0 253 1 31251951616 dm-1 253 2 31251951616 dm-2 253 3 31251951616 dm-3 253 4 31251951616 dm-4 253 5 31251951616 dm-5 253 6 31251951616 dm-6 253 7 31251951616 dm-7 253 8 31251951616 dm-8 253 9 31251951616 dm-9 253 10 31251951616 dm-10 253 11 31251951616 dm-11 253 12 584960000 dm-12 253 13 524288 dm-13 253 14 16777216 dm-14 253 15 567657472 dm-15 [root at mmmnsd5 ~]# -- The NSDs had no failure group defined on creation. Regards, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Ed Wahl Sent: Wednesday, October 29, 2014 1:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Server lost NSD mappings Can you see the block devices from inside the OS after the reboot? I don't see where you mention this. How is the storage attached to the server? As a DCS37|800 can be FC/SAS/IB which is yours? Do the nodes share the storage? All nsds in same failure group? I was quickly brought to mind of a failed SRP_DAEMON lookup to IB storage from a badly updated IB card but I would hope you'd notice the lack of block devices. cat /proc/partitions ? multipath -l ? Our GPFS changes device mapper multipath names all the time (dm-127 one day, dm-something else another), so that is no problem. But wacking the volume label is a pain. When hardware dies if you have nsds sharing the same LUNs you can just transfer /var/mmfs/gen/mmsdrfs from another node and Bob's your uncle. Ed Wahl OSC ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Jared David Baker [Jared.Baker at uwyo.edu] Sent: Wednesday, October 29, 2014 11:31 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Server lost NSD mappings Hello all, I'm hoping that somebody can shed some light on a problem that I experienced yesterday. I've been working with GPFS for a couple months as an admin now, but I've come across a problem that I'm unable to see the answer to. Hopefully the solution is not listed somewhere blatantly on the web, but I spent a fair amount of time looking last night. Here is the situation: yesterday, I needed to update some firmware on a Mellanox HCA FDR14 card and reboot one of our GPFS servers and repeat for the sister node (IBM x3550 and DCS3850) as HPSS for our main campus cluster. However, upon reboot, the server seemed to lose the path mappings to the multipath devices for the NSDs. Output below: -- [root at mmmnsd5 ~]# mmlsnsd -m -f gscratch Disk name NSD volume ID Device Node name Remarks --------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- Also, the system was working fantastically before the reboot, but now I'm unable to mount the GPFS filesystem. The disk names look like they are there and mapped to the NSD volume ID, but there is no Device. I've created the /var/mmfs/etc/nsddevices script and it has the following output with user return 0: -- [root at mmmnsd5 ~]# /var/mmfs/etc/nsddevices mapper/dcs3800u31a_lun0 dmm mapper/dcs3800u31a_lun10 dmm mapper/dcs3800u31a_lun2 dmm mapper/dcs3800u31a_lun4 dmm mapper/dcs3800u31a_lun6 dmm mapper/dcs3800u31a_lun8 dmm mapper/dcs3800u31b_lun1 dmm mapper/dcs3800u31b_lun11 dmm mapper/dcs3800u31b_lun3 dmm mapper/dcs3800u31b_lun5 dmm mapper/dcs3800u31b_lun7 dmm mapper/dcs3800u31b_lun9 dmm [root at mmmnsd5 ~]# -- That output looks correct to me based on the documentation. So I went digging in the GPFS log file and found this relevant information: -- Tue Oct 28 23:44:48.405 2014: I/O to NSD disk, dcs3800u31a_lun0, fails. No such NSD locally found. Tue Oct 28 23:44:48.481 2014: I/O to NSD disk, dcs3800u31b_lun1, fails. No such NSD locally found. Tue Oct 28 23:44:48.555 2014: I/O to NSD disk, dcs3800u31a_lun2, fails. No such NSD locally found. Tue Oct 28 23:44:48.629 2014: I/O to NSD disk, dcs3800u31b_lun3, fails. No such NSD locally found. Tue Oct 28 23:44:48.703 2014: I/O to NSD disk, dcs3800u31a_lun4, fails. No such NSD locally found. Tue Oct 28 23:44:48.775 2014: I/O to NSD disk, dcs3800u31b_lun5, fails. No such NSD locally found. Tue Oct 28 23:44:48.844 2014: I/O to NSD disk, dcs3800u31a_lun6, fails. No such NSD locally found. Tue Oct 28 23:44:48.919 2014: I/O to NSD disk, dcs3800u31b_lun7, fails. No such NSD locally found. Tue Oct 28 23:44:48.989 2014: I/O to NSD disk, dcs3800u31a_lun8, fails. No such NSD locally found. Tue Oct 28 23:44:49.060 2014: I/O to NSD disk, dcs3800u31b_lun9, fails. No such NSD locally found. Tue Oct 28 23:44:49.128 2014: I/O to NSD disk, dcs3800u31a_lun10, fails. No such NSD locally found. Tue Oct 28 23:44:49.199 2014: I/O to NSD disk, dcs3800u31b_lun11, fails. No such NSD locally found. -- Okay, so the NSDs don't seem to be able to be found, so I attempt to rediscover the NSD by executing the command mmnsddiscover: -- [root at mmmnsd5 ~]# mmnsddiscover mmnsddiscover: Attempting to rediscover the disks. This may take a while ... mmnsddiscover: Finished. [root at mmmnsd5 ~]# -- I was hoping that finished, but then upon restarting GPFS, there was no success. Verifying with mmlsnsd -X -f gscratch -- [root at mmmnsd5 ~]# mmlsnsd -X -f gscratch Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- dcs3800u31a_lun0 0A62001B54235577 - - mminsd5.infini (not found) server node dcs3800u31a_lun0 0A62001B54235577 - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd6.infini (not found) server node dcs3800u31a_lun10 0A62001C542355AA - - mminsd5.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd6.infini (not found) server node dcs3800u31a_lun2 0A62001C54235581 - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd5.infini (not found) server node dcs3800u31a_lun4 0A62001B5423558B - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd6.infini (not found) server node dcs3800u31a_lun6 0A62001C54235595 - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd5.infini (not found) server node dcs3800u31a_lun8 0A62001B5423559F - - mminsd6.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd5.infini (not found) server node dcs3800u31b_lun1 0A62001B5423557C - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd6.infini (not found) server node dcs3800u31b_lun11 0A62001C542355AF - - mminsd5.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd6.infini (not found) server node dcs3800u31b_lun3 0A62001C54235586 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd5.infini (not found) server node dcs3800u31b_lun5 0A62001B54235590 - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd6.infini (not found) server node dcs3800u31b_lun7 0A62001C5423559A - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd5.infini (not found) server node dcs3800u31b_lun9 0A62001B542355A4 - - mminsd6.infini (not found) server node [root at mmmnsd5 ~]# -- I'm wondering if somebody has seen this type of issue before? Will recreating my NSDs destroy the filesystem? I'm thinking that all the data is intact, but there is no crucial data on this file system yet, so I could recreate the file system, but I would like to learn how to solve a problem like this. Thanks for all help and information. Regards, Jared _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Oct 29 20:32:25 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 20:32:25 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> Message-ID: <54514ED9.9030604@buzzard.me.uk> On 29/10/14 20:25, Sven Oehme wrote: > Hi, > > based on what i see is your BIOS or FW update wiped the NSD descriptor > by restoring a GPT table on the start of a disk that shouldn't have a > GPT table to begin with as its under control of GPFS. > future releases of GPFS prevent this by writing our own GPT label to the > disks so other tools don't touch them, but that doesn't help in your > case any more. if you want this officially confirmed i would still open > a PMR, but at that point given that you don't seem to have any > production data on it from what i see in your response you should > recreate the filesystem. > However before recreating the file system I would run the script to see if your disks have the secondary copy of the GPT partition table and if they do make sure it is wiped/removed *BEFORE* you go any further. Otherwise it could happen again... JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Jared.Baker at uwyo.edu Wed Oct 29 20:47:51 2014 From: Jared.Baker at uwyo.edu (Jared David Baker) Date: Wed, 29 Oct 2014 20:47:51 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: <54514ED9.9030604@buzzard.me.uk> References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> <54514ED9.9030604@buzzard.me.uk> Message-ID: Jonathan, which script are you talking about? Thanks, Jared -----Original Message----- From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Jonathan Buzzard Sent: Wednesday, October 29, 2014 2:32 PM To: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Server lost NSD mappings On 29/10/14 20:25, Sven Oehme wrote: > Hi, > > based on what i see is your BIOS or FW update wiped the NSD descriptor > by restoring a GPT table on the start of a disk that shouldn't have a > GPT table to begin with as its under control of GPFS. > future releases of GPFS prevent this by writing our own GPT label to the > disks so other tools don't touch them, but that doesn't help in your > case any more. if you want this officially confirmed i would still open > a PMR, but at that point given that you don't seem to have any > production data on it from what i see in your response you should > recreate the filesystem. > However before recreating the file system I would run the script to see if your disks have the secondary copy of the GPT partition table and if they do make sure it is wiped/removed *BEFORE* you go any further. Otherwise it could happen again... JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan at buzzard.me.uk Wed Oct 29 21:01:06 2014 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 29 Oct 2014 21:01:06 +0000 Subject: [gpfsug-discuss] Server lost NSD mappings In-Reply-To: References: <3f0b084aedb24f76bfbbd5b6aa3067f4@CY1PR0501MB1259.namprd05.prod.outlook.com> <07589ce098224a2fb297571c547ee62d@CY1PR0501MB1259.namprd05.prod.outlook.com> <9615167cb0624262a718f477fab85830@CY1PR0501MB1259.namprd05.prod.outlook.com> <54514ED9.9030604@buzzard.me.uk> Message-ID: <54515592.4050606@buzzard.me.uk> On 29/10/14 20:47, Jared David Baker wrote: > Jonathan, which script are you talking about? > The one here https://www.ibm.com/developerworks/community/forums/html/topic?id=32296bac-bfa1-45ff-9a43-08b0a36b17ef&ps=25 Use for detecting and clearing that secondary GPT table. Never used it of course, my disaster was caused by an idiot admin installing a new OS not mapping the disks out and then hit yes yes yes when asked if he wanted to blank the disks, the RHEL installer duly obliged. Then five days later I rebooted the last NSD server for an upgrade and BOOM 50TB and 80 million files down the swanny. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From mark.bergman at uphs.upenn.edu Fri Oct 31 17:10:55 2014 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Fri, 31 Oct 2014 13:10:55 -0400 Subject: [gpfsug-discuss] mapping to hostname? Message-ID: <25152-1414775455.156309@Pc2q.WYui.XCNm> Many GPFS logs & utilities refer to nodes via their name. I haven't found an "mm*" executable that shows the mapping between that name an the hostname. Is there a simple method to map the designation to the node's hostname? Thanks, Mark From bevans at pixitmedia.com Fri Oct 31 17:32:45 2014 From: bevans at pixitmedia.com (Barry Evans) Date: Fri, 31 Oct 2014 17:32:45 +0000 Subject: [gpfsug-discuss] mapping to hostname? In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm> References: <25152-1414775455.156309@Pc2q.WYui.XCNm> Message-ID: <5453C7BD.8030608@pixitmedia.com> I'm sure there is a better way to do this, but old habits die hard. I tend to use 'mmfsadm saferdump tscomm' - connection details should be littered throughout. Cheers, Barry ArcaStream/Pixit Media mark.bergman at uphs.upenn.edu wrote: > Many GPFS logs& utilities refer to nodes via their name. > > I haven't found an "mm*" executable that shows the mapping between that > name an the hostname. > > Is there a simple method to map the designation to the node's > hostname? > > Thanks, > > Mark > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. From oehmes at us.ibm.com Fri Oct 31 18:20:40 2014 From: oehmes at us.ibm.com (Sven Oehme) Date: Fri, 31 Oct 2014 11:20:40 -0700 Subject: [gpfsug-discuss] mapping to hostname? In-Reply-To: <25152-1414775455.156309@Pc2q.WYui.XCNm> References: <25152-1414775455.156309@Pc2q.WYui.XCNm> Message-ID: Hi, the official way to do this is mmdiag --network thx. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: oehmes at us.ibm.com Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ From: mark.bergman at uphs.upenn.edu To: gpfsug main discussion list Date: 10/31/2014 10:11 AM Subject: [gpfsug-discuss] mapping to hostname? Sent by: gpfsug-discuss-bounces at gpfsug.org Many GPFS logs & utilities refer to nodes via their name. I haven't found an "mm*" executable that shows the mapping between that name an the hostname. Is there a simple method to map the designation to the node's hostname? Thanks, Mark _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.bergman at uphs.upenn.edu Fri Oct 31 18:57:44 2014 From: mark.bergman at uphs.upenn.edu (mark.bergman at uphs.upenn.edu) Date: Fri, 31 Oct 2014 14:57:44 -0400 Subject: [gpfsug-discuss] mapping to hostname? In-Reply-To: Your message of "Fri, 31 Oct 2014 11:20:40 -0700." References: <25152-1414775455.156309@Pc2q.WYui.XCNm> Message-ID: <9586-1414781864.388104@tEdB.dMla.tGDi> In the message dated: Fri, 31 Oct 2014 11:20:40 -0700, The pithy ruminations from Sven Oehme on to hostname?> were: => Hi, => => the official way to do this is mmdiag --network OK. I'm now using: mmdiag --network | awk '{if ( $1 ~ / => thx. Sven => => => ------------------------------------------ => Sven Oehme => Scalable Storage Research => email: oehmes at us.ibm.com => Phone: +1 (408) 824-8904 => IBM Almaden Research Lab => ------------------------------------------ => => => => From: mark.bergman at uphs.upenn.edu => To: gpfsug main discussion list => Date: 10/31/2014 10:11 AM => Subject: [gpfsug-discuss] mapping to hostname? => Sent by: gpfsug-discuss-bounces at gpfsug.org => => => => Many GPFS logs & utilities refer to nodes via their name. => => I haven't found an "mm*" executable that shows the mapping between that => name an the hostname. => => Is there a simple method to map the designation to the node's => hostname? => => Thanks, => => Mark =>