From Luke.Raimbach at crick.ac.uk Fri Jul 1 11:32:13 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Fri, 1 Jul 2016 10:32:13 +0000 Subject: [gpfsug-discuss] Trapped Inodes Message-ID: Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From makaplan at us.ibm.com Fri Jul 1 17:29:31 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 1 Jul 2016 12:29:31 -0400 Subject: [gpfsug-discuss] Trapped Inodes In-Reply-To: References: Message-ID: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Sat Jul 2 11:05:34 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Sat, 2 Jul 2016 10:05:34 +0000 Subject: [gpfsug-discuss] Trapped Inodes Message-ID: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Sat Jul 2 20:16:55 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sat, 2 Jul 2016 15:16:55 -0400 Subject: [gpfsug-discuss] Trapped Inodes In-Reply-To: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> References: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Message-ID: I have been informed that it is possible that a glitch can occur (for example an abrupt shutdown) which can leave you in a situation where it looks like all snapshots are deleted, but there is still a hidden snapshot that must be cleaned up... The workaround is to create a snapshot `mmcrsnapshot fs dummy` and then delete it `mmdelsnapshot fs dummy` and see if that clears up the situation... --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/02/2016 06:05 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Sun Jul 3 11:32:24 2016 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Sun, 3 Jul 2016 12:32:24 +0200 Subject: [gpfsug-discuss] Improving Testing Efficiency with IBM Spectrum Scale for Automated Driving Message-ID: In the press today: "Tesla Autopilot partner Mobileye comments on fatal crash, says tech isn?t meant to avoid this type of accident." http://electrek.co/2016/07/01/tesla-autopilot-mobileye-fatal-crash-comment/ "Tesla?s autopilot system was designed in-house and uses a fusion of dozens of internally- and externally-developed component technologies to determine the proper course of action in a given scenario. Since January 2016, Autopilot activates automatic emergency braking in response to any interruption of the ground plane in the path of the vehicle that cross-checks against a consistent radar signature. In the case of this accident, the high, white side of the box truck, combined with a radar signature that would have looked very similar to an overhead sign, caused automatic braking not to fire.? More testing is needed ! Finding a way to improve ADAS/AD testing throughput by factor. more HiL tests would have better helped to avoid this accident I guess, as white side box trucks are very common on the roads arent't they? So another strong reason to use GPFS/SpectrumScale/ESS filesystems to provide video files to paralell HiL stations for testing and verification using IBM AREMA for Automotive as essence system in order to find the relevant test cases. Facts: Currently most of the testing is done by copying large video files from some kind of "slow" NAS filer to the HiL stations and running the HiL test case from the internal HiL disks. A typical HiL test run takes 7-9min while the copy alone takes an additional 3-5 min upfront depending on the setup. Together with IBM partner SVA we tested to stream these video files from a ESS GL6 directly to the HiL stations without to copy them first. This worked well and the latency was fine and stable. As a result we could improve the number of HiL test cases per month by a good factor without adding more HiL hardware. See my presentation from the GPFS User Day at SPXXL 2016 in Garching February 17th 2016 9:00 - 9:30 Improving Testing Efficiency with IBM Spectrum Scale for Automated Driving https://www.spxxl.org/sites/default/files/GPFS-AREMA-TSM_et_al_for_ADAS_AD_Testing-Feb2016.pdf More: http://electrek.co/2015/10/14/tesla-reveals-all-the-details-of-its-autopilot-and-its-software-v7-0-slide-presentation-and-audio-conference/ -frank- P.S. HiL = Hardware in the Loop https://en.wikipedia.org/wiki/Hardware-in-the-loop_simulation Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Am Weiher 24, 65451 Kelsterbach mailto:kraemerf at de.ibm.com voice: +49-(0)171-3043699 / +4970342741078 IBM Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Sun Jul 3 15:55:26 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Sun, 3 Jul 2016 14:55:26 +0000 Subject: [gpfsug-discuss] Trapped Inodes In-Reply-To: References: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Message-ID: Hi Marc, Thanks for that suggestions. This seems to have removed the NULL fileset from the list, however mmdf now shows even more strange statistics: Inode Information ----------------- Total number of used inodes in all Inode spaces: -103900000 Total number of free inodes in all Inode spaces: -24797856 Total number of allocated inodes in all Inode spaces: -128697856 Total of Maximum number of inodes in all Inode spaces: -103900000 Any ideas why these negative numbers are being reported? Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: 02 July 2016 20:17 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Trapped Inodes I have been informed that it is possible that a glitch can occur (for example an abrupt shutdown) which can leave you in a situation where it looks like all snapshots are deleted, but there is still a hidden snapshot that must be cleaned up... The workaround is to create a snapshot `mmcrsnapshot fs dummy` and then delete it `mmdelsnapshot fs dummy` and see if that clears up the situation... --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/02/2016 06:05 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan > wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Sun Jul 3 19:42:32 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 3 Jul 2016 14:42:32 -0400 Subject: [gpfsug-discuss] Trapped Inodes - releases, now count them properly! In-Reply-To: References: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Message-ID: mmdf statistics are not real-time accurate, there is a trade off in accuracy vs the cost of polling each node that might have the file system mounted. That said, here are some possibilities, in increasing order of impact on users and your possible desperation ;-) A1. Wait a while (at most a few minutes) and see if the mmdf stats are updated. A2. mmchmgr fs another-node may force new stats to be sent to the new fs manager. (Not sure but I expect it will.) B. Briefly quiesce the file system with: mmfsctl fs suspend; mmfsctl fs resume; C. If you have no users active ... I'm pretty sure mmumount fs -a ; mmmount fs -a; will clear the problem ... but there's always D. mmshutdown -a ; mmstartup -a E. If none of those resolve the situation something is hosed -- F. hope that mmfsck can fix it. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/03/2016 10:55 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Marc, Thanks for that suggestions. This seems to have removed the NULL fileset from the list, however mmdf now shows even more strange statistics: Inode Information ----------------- Total number of used inodes in all Inode spaces: -103900000 Total number of free inodes in all Inode spaces: -24797856 Total number of allocated inodes in all Inode spaces: -128697856 Total of Maximum number of inodes in all Inode spaces: -103900000 Any ideas why these negative numbers are being reported? Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: 02 July 2016 20:17 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Trapped Inodes I have been informed that it is possible that a glitch can occur (for example an abrupt shutdown) which can leave you in a situation where it looks like all snapshots are deleted, but there is still a hidden snapshot that must be cleaned up... The workaround is to create a snapshot `mmcrsnapshot fs dummy` and then delete it `mmdelsnapshot fs dummy` and see if that clears up the situation... --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/02/2016 06:05 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Mon Jul 4 10:44:02 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Mon, 4 Jul 2016 09:44:02 +0000 Subject: [gpfsug-discuss] Trapped Inodes - releases, now count them properly! In-Reply-To: References: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Message-ID: Hi Marc, Thanks again for the suggestions. An interesting report in the log while the another node took over managing the filesystem: Mon Jul 4 10:24:08.616 2016: [W] Inode space 10 in file system gpfs is approaching the limit for the maximum number of inodes. Inode space 10 was the independent fileset that the snapshot creation/deletion managed to remove. Still getting negative inode numbers reported after migrating manager functions and suspending/resuming the file system: Inode Information ----------------- Total number of used inodes in all Inode spaces: -103900000 Total number of free inodes in all Inode spaces: -24797856 Total number of allocated inodes in all Inode spaces: -128697856 Total of Maximum number of inodes in all Inode spaces: -103900000 I?ll have to wait until later today to try unmounting, daemon recycle or mmfsck. Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: 03 July 2016 19:43 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Trapped Inodes - releases, now count them properly! mmdf statistics are not real-time accurate, there is a trade off in accuracy vs the cost of polling each node that might have the file system mounted. That said, here are some possibilities, in increasing order of impact on users and your possible desperation ;-) A1. Wait a while (at most a few minutes) and see if the mmdf stats are updated. A2. mmchmgr fs another-node may force new stats to be sent to the new fs manager. (Not sure but I expect it will.) B. Briefly quiesce the file system with: mmfsctl fs suspend; mmfsctl fs resume; C. If you have no users active ... I'm pretty sure mmumount fs -a ; mmmount fs -a; will clear the problem ... but there's always D. mmshutdown -a ; mmstartup -a E. If none of those resolve the situation something is hosed -- F. hope that mmfsck can fix it. --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/03/2016 10:55 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Marc, Thanks for that suggestions. This seems to have removed the NULL fileset from the list, however mmdf now shows even more strange statistics: Inode Information ----------------- Total number of used inodes in all Inode spaces: -103900000 Total number of free inodes in all Inode spaces: -24797856 Total number of allocated inodes in all Inode spaces: -128697856 Total of Maximum number of inodes in all Inode spaces: -103900000 Any ideas why these negative numbers are being reported? Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: 02 July 2016 20:17 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Trapped Inodes I have been informed that it is possible that a glitch can occur (for example an abrupt shutdown) which can leave you in a situation where it looks like all snapshots are deleted, but there is still a hidden snapshot that must be cleaned up... The workaround is to create a snapshot `mmcrsnapshot fs dummy` and then delete it `mmdelsnapshot fs dummy` and see if that clears up the situation... --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/02/2016 06:05 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan > wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Tue Jul 5 15:25:06 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Tue, 5 Jul 2016 14:25:06 +0000 Subject: [gpfsug-discuss] Samba Export Anomalies Message-ID: Hi All, I'm having a frustrating time exporting an Independent Writer AFM fileset through Samba. Native GPFS directories exported through Samba seem to work properly, but when creating an export which points to an AFM IW fileset, I get "Access Denied" errors when trying to create files from an SMB client and even more unusual "Failed to enumerate objects in the container: Access is denied." messages if I try to modify the Access Control Entries through a Windows client. Here is the smb.conf file: ***[BEGIN smb.conf]*** [global] idmap config * : backend = autorid idmap config * : range = 100000-999999 idmap config THECRICK : backend = ad idmap config THECRICK : schema_mode = rfc2307 idmap config THECRICK : range = 30000000-31999999 local master = no realm = THECRICK.ORG security = ADS aio read size = 1 aio write size = 1 async smb echo handler = yes clustering = yes ctdbd socket = /var/run/ctdb/ctdbd.socket ea support = yes force unknown acl user = yes level2 oplocks = no log file = /var/log/samba/log.%m log level = 3 map hidden = yes map readonly = no netbios name = MS_GENERAL printcap name = /etc/printcap printing = lprng server string = Samba Server Version %v socket options = TCP_NODELAY SO_KEEPALIVE TCP_KEEPCNT=4 TCP_KEEPIDLE=240 TCP_KEEPINTVL=15 store dos attributes = yes strict allocate = yes strict locking = no unix extensions = no vfs objects = shadow_copy2 syncops fileid streams_xattr gpfs gpfs:dfreequota = yes gpfs:hsm = yes gpfs:leases = yes gpfs:prealloc = yes gpfs:sharemodes = yes gpfs:winattr = yes nfs4:acedup = merge nfs4:chown = yes nfs4:mode = simple notify:inotify = yes shadow:fixinodes = yes shadow:format = @GMT-%Y.%m.%d-%H.%M.%S shadow:snapdir = .snapshots shadow:snapdirseverywhere = yes shadow:sort = desc smbd:backgroundqueue = false smbd:search ask sharemode = false syncops:onmeta = no workgroup = THECRICK winbind enum groups = yes winbind enum users = yes [production_rw] comment = Production writable path = /general/production read only = no [stp-test] comment = STP Test Export path = /general/export/stp/stp-test read-only = no ***[END smb.conf]*** The [production_rw] export is a test directory on the /general filesystem which works from an SMB client. The [stp-test] export is an AFM fileset on the /general filesystem which is a cache of a directory in another GPFS filesystem: ***[BEGIN mmlsfileset general crick.general.export.stp.stp-test --afm -L]*** Attributes for fileset crick.general.export.stp.stp-test: ========================================================== Status Linked Path /general/export/stp/stp-test Id 1 Root inode 1048579 Parent Id 0 Created Fri Jul 1 15:56:48 2016 Comment Inode space 1 Maximum number of inodes 200000 Allocated inodes 100000 Permission change flag chmodAndSetacl afm-associated Yes Target gpfs:///camp/stp/stp-test Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 4 Prefetch Threshold 0 (default) Eviction Enabled yes (default) ***[END mmlsfileset general crick.general.export.stp.stp-test --afm -L]*** Anyone spot any glaringly obvious misconfigurations? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From bbanister at jumptrading.com Tue Jul 5 15:58:35 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 5 Jul 2016 14:58:35 +0000 Subject: [gpfsug-discuss] nosmap parameter for RHEL7 x86_64 on Haswell/Broadwell? In-Reply-To: <565240ad49e6476da9c1d3d11312f88c@mbxpsc1.winmail.deshaw.com> References: <565240ad49e6476da9c1d3d11312f88c@mbxpsc1.winmail.deshaw.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB061B491A@CHI-EXCHANGEW1.w2k.jumptrading.com> Wanted to comment that we also hit this issue and agree with Paul that it would be nice in the FAQ to at least have something like the vertical bars that denote changed or added lines in a document, which are seen in the GPFS Admin guides. This should make it easy to see what has changed. Would also be nice to "Follow this page" to get notifications of when the FAQ changes from my IBM Knowledge Center account... or maybe the person that publishes the changes could announce the update on the GPFS - Announce Developer Works page. https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000001606 Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sanchez, Paul Sent: Friday, June 03, 2016 2:38 PM To: gpfsug main discussion list (gpfsug-discuss at spectrumscale.org) Subject: [gpfsug-discuss] nosmap parameter for RHEL7 x86_64 on Haswell/Broadwell? After some puzzling debugging on our new Broadwell servers, all of which slowly became brick-like upon after getting stuck starting GPFS, we discovered that this was already a known issue in the FAQ. Adding "nosmap" to the kernel command line in grub prevents SMAP from seeing the kernel-userspace memory interactions of GPFS as a reason to slowly grind all cores to a standstill, apparently spinning on stuck locks(?). (Big thanks go to RedHat for turning us on to the answer when we opened a case.) >From https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html, section 3.2: Note: In order for IBM Spectrum Scale on RHEL 7 to run on the Haswell processor * Disable the Supervisor Mode Access Prevention (smap) kernel parameter * Reboot the RHEL 7 node before using GPFS Some observations worth noting: 1. We've been running for a year with Haswell processors and have hundreds of Haswell RHEL7 nodes which do not exhibit this problem. So maybe this only really affects Broadwell CPUs? 2. It would be very nice for SpectrumScale to take a peek at /proc/cpuinfo and /proc/cmdline before starting up, and refuse to break the host when it has affected processors and kernel without "nosmap". Instead, an error message describing the fix would have made my day. 3. I'm going to have to start using a script to diff the FAQ for these gotchas, unless anyone knows of a better way to subscribe just to updates to this doc. Thanks, Paul Sanchez ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From duersch at us.ibm.com Tue Jul 5 19:31:28 2016 From: duersch at us.ibm.com (Steve Duersch) Date: Tue, 5 Jul 2016 14:31:28 -0400 Subject: [gpfsug-discuss] nosmap parameter for RHEL7 x86_64 on Haswell/Broadwell? FAQ Updates Message-ID: The PDF version of the FAQ ( http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/gpfsclustersfaq.pdf ) does have change bars. Also at the top it lists the questions that have been changed. Your suggestion for "announcing" new faq version does make sense and I'll email the one responsible for posting the faq. Thank you. Steve Duersch Spectrum Scale (GPFS) FVTest 845-433-7902 IBM Poughkeepsie, New York Message: 2 Date: Tue, 5 Jul 2016 14:58:35 +0000 From: Bryan Banister To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] nosmap parameter for RHEL7 x86_64 on Haswell/Broadwell? Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB061B491A at CHI-EXCHANGEW1.w2k.jumptrading.com> Content-Type: text/plain; charset="us-ascii" Wanted to comment that we also hit this issue and agree with Paul that it would be nice in the FAQ to at least have something like the vertical bars that denote changed or added lines in a document, which are seen in the GPFS Admin guides. This should make it easy to see what has changed. Would also be nice to "Follow this page" to get notifications of when the FAQ changes from my IBM Knowledge Center account... or maybe the person that publishes the changes could announce the update on the GPFS - Announce Developer Works page. https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000001606 Cheers, -Bryan -------------- next part -------------- An HTML attachment was scrubbed... URL: From konstantin.arnold at unibas.ch Tue Jul 5 19:53:03 2016 From: konstantin.arnold at unibas.ch (Konstantin Arnold) Date: Tue, 5 Jul 2016 20:53:03 +0200 Subject: [gpfsug-discuss] Samba Export Anomalies In-Reply-To: References: Message-ID: <577C020F.2080507@unibas.ch> Hi Luke, probably I don't have enough information about your AFM setup but maybe you could check the ACLs on the export as well as ACLs on the directory to be mounted. If you are using AFM from a home location that has ACLs set then they will also be transferred to cache location. (We ran into similar issues when we had to take over data from a SONAS system that was assigning gid_numbers from an internal mapping table - all had to be cleaned up first before clients could have access through our CES system.) Best Konstantin On 07/05/2016 04:25 PM, Luke Raimbach wrote: > Hi All, > > I'm having a frustrating time exporting an Independent Writer AFM fileset through Samba. > > Native GPFS directories exported through Samba seem to work properly, but when creating an export which points to an AFM IW fileset, I get "Access Denied" errors when trying to create files from an SMB client and even more unusual "Failed to enumerate objects in the container: Access is denied." messages if I try to modify the Access Control Entries through a Windows client. > > Here is the smb.conf file: > > ***[BEGIN smb.conf]*** > > [global] > idmap config * : backend = autorid > idmap config * : range = 100000-999999 > idmap config THECRICK : backend = ad > idmap config THECRICK : schema_mode = rfc2307 > idmap config THECRICK : range = 30000000-31999999 > local master = no > realm = THECRICK.ORG > security = ADS > aio read size = 1 > aio write size = 1 > async smb echo handler = yes > clustering = yes > ctdbd socket = /var/run/ctdb/ctdbd.socket > ea support = yes > force unknown acl user = yes > level2 oplocks = no > log file = /var/log/samba/log.%m > log level = 3 > map hidden = yes > map readonly = no > netbios name = MS_GENERAL > printcap name = /etc/printcap > printing = lprng > server string = Samba Server Version %v > socket options = TCP_NODELAY SO_KEEPALIVE TCP_KEEPCNT=4 TCP_KEEPIDLE=240 TCP_KEEPINTVL=15 > store dos attributes = yes > strict allocate = yes > strict locking = no > unix extensions = no > vfs objects = shadow_copy2 syncops fileid streams_xattr gpfs > gpfs:dfreequota = yes > gpfs:hsm = yes > gpfs:leases = yes > gpfs:prealloc = yes > gpfs:sharemodes = yes > gpfs:winattr = yes > nfs4:acedup = merge > nfs4:chown = yes > nfs4:mode = simple > notify:inotify = yes > shadow:fixinodes = yes > shadow:format = @GMT-%Y.%m.%d-%H.%M.%S > shadow:snapdir = .snapshots > shadow:snapdirseverywhere = yes > shadow:sort = desc > smbd:backgroundqueue = false > smbd:search ask sharemode = false > syncops:onmeta = no > workgroup = THECRICK > winbind enum groups = yes > winbind enum users = yes > > [production_rw] > comment = Production writable > path = /general/production > read only = no > > [stp-test] > comment = STP Test Export > path = /general/export/stp/stp-test > read-only = no > > ***[END smb.conf]*** > > > The [production_rw] export is a test directory on the /general filesystem which works from an SMB client. The [stp-test] export is an AFM fileset on the /general filesystem which is a cache of a directory in another GPFS filesystem: > > > ***[BEGIN mmlsfileset general crick.general.export.stp.stp-test --afm -L]*** > > Attributes for fileset crick.general.export.stp.stp-test: > ========================================================== > Status Linked > Path /general/export/stp/stp-test > Id 1 > Root inode 1048579 > Parent Id 0 > Created Fri Jul 1 15:56:48 2016 > Comment > Inode space 1 > Maximum number of inodes 200000 > Allocated inodes 100000 > Permission change flag chmodAndSetacl > afm-associated Yes > Target gpfs:///camp/stp/stp-test > Mode independent-writer > File Lookup Refresh Interval 30 (default) > File Open Refresh Interval 30 (default) > Dir Lookup Refresh Interval 60 (default) > Dir Open Refresh Interval 60 (default) > Async Delay 15 (default) > Last pSnapId 0 > Display Home Snapshots no > Number of Gateway Flush Threads 4 > Prefetch Threshold 0 (default) > Eviction Enabled yes (default) > > ***[END mmlsfileset general crick.general.export.stp.stp-test --afm -L]*** > > > Anyone spot any glaringly obvious misconfigurations? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, > The Francis Crick Institute, > Gibbs Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From r.sobey at imperial.ac.uk Wed Jul 6 10:37:29 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 09:37:29 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> Message-ID: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Jul 6 10:47:16 2016 From: bevans at pixitmedia.com (Barry Evans) Date: Wed, 6 Jul 2016 10:47:16 +0100 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> Message-ID: <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: > > Quick followup on this. Doing some more samba debugging (i.e. > increasing log levels!) and come up with the following: > > [2016/07/06 10:07:35.602080, 3] > ../source3/smbd/vfs.c:1322(check_reduced_name) > > check_reduced_name: > admin/ict/serviceoperations/slough_project/Slough_Layout reduced to > /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout > > [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) > > unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) > returning 0644 > > [2016/07/06 10:07:35.613374, 0] > ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) > > * user does not have list permission on snapdir > /gpfs/prd/groupspace/ic/admin/ict/.snapshots* > > [2016/07/06 10:07:35.613416, 0] > ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) > > access denied on listing snapdir > /gpfs/prd/groupspace/ic/admin/ict/.snapshots > > [2016/07/06 10:07:35.613434, 0] > ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) > > FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, > failed - NT_STATUS_ACCESS_DENIED. > > [2016/07/06 10:07:47.648557, 3] > ../source3/smbd/service.c:1138(close_cnum) > > 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to > service IPC$ > > Any takers? I cannot run mmgetacl on the .snapshots folder at all, as > root. A snapshot I just created to make sure I had full control on the > folder: (39367 is me, I didn?t run this command on a CTDB node so the > UID mapping isn?t working). > > [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 > > #NFSv4 ACL > > #owner:root > > #group:root > > group:74036:r-x-:allow:FileInherit:DirInherit:Inherited > > (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL > (X)READ_ATTR (X)READ_NAMED > > (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL > (-)WRITE_ATTR (-)WRITE_NAMED > > user:39367:rwxc:allow:FileInherit:DirInherit:Inherited > > (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL > (X)READ_ATTR (X)READ_NAMED > > (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL > (X)WRITE_ATTR (X)WRITE_NAMED > > *From:*gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] *On Behalf Of > *Sobey, Richard A > *Sent:* 20 June 2016 16:03 > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but > our customers have come to like previous versions and indeed it is > sort of a selling point for us. > > Samba is the only thing we?ve changed recently after the badlock > debacle so I?m tempted to blame that, but who knows. > > If (when) I find out I?ll let everyone know. > > Richard > > *From:*gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] *On Behalf Of > *Buterbaugh, Kevin L > *Sent:* 20 June 2016 15:56 > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Hi Richard, > > I can?t answer your question but I can tell you that we have > experienced either the exact same thing you are or something very > similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 > and it persists even after upgraded to GPFS 4.2.0.3 and the very > latest sernet-samba. > > And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* > upgrade SAMBA versions at that time. Therefore, I believe that > something changed in GPFS. That doesn?t mean it?s GPFS? fault, of > course. SAMBA may have been relying on a > bugundocumented feature in GPFS that IBM fixed > for all I know, and I?m obviously speculating here. > > The problem we see is that the .snapshots directory in each folder can > be cd?d to but is empty. The snapshots are all there, however, if you: > > cd //.snapshots/ taken>/rest/of/path/to/folder/in/question > > This obviously prevents users from being able to do their own recovery > of files unless you do something like what you describe, which we are > unwilling to do for security reasons. We have a ticket open with DDN? > > Kevin > > On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > > wrote: > > Hi all > > Can someone clarify if the ability for Windows to view snapshots > as Previous Versions is exposed by SAMBA or GPFS? Basically, if > suddenly my users cannot restore files from snapshots over a CIFS > share, where should I be looking? > > I don?t know when this problem occurred, but within the last few > weeks certainly our users with full control over their data now > see no previous versions available, but if we export their fileset > and set ?force user = root? all the snapshots are available. > > I think the answer is SAMBA, right? We?re running GPFS 3.5 and > sernet-samba 4.2.9. > > Many thanks > > Richard > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss atspectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ? > > Kevin Buterbaugh - Senior System Administrator > > Vanderbilt University - Advanced Computing Center for Research and > Education > > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 6 10:55:14 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 09:55:14 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn't run this command on a CTDB node so the UID mapping isn't working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we've changed recently after the badlock debacle so I'm tempted to blame that, but who knows. If (when) I find out I'll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can't answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn't mean it's GPFS' fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I'm obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd'd to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN... Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don't know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set "force user = root" all the snapshots are available. I think the answer is SAMBA, right? We're running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss - Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com [http://pixitmedia.com/sig/sig-cio.jpg] This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Wed Jul 6 12:50:56 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 6 Jul 2016 11:50:56 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 6 13:22:53 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 12:22:53 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch, (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jul 6 15:45:57 2016 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 6 Jul 2016 07:45:57 -0700 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From r.sobey at imperial.ac.uk Wed Jul 6 15:54:25 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 14:54:25 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jul 6 16:21:06 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 6 Jul 2016 15:21:06 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: <6C7203C4-27D4-40E7-979F-B186D4622BC7@vanderbilt.edu> Hi Richard, Is that a typo in the version? We?re also using Sernet Samba but we?ve got 4.3.9? Kevin On Jul 6, 2016, at 9:54 AM, Sobey, Richard A > wrote: Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 6 16:23:16 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 15:23:16 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: <6C7203C4-27D4-40E7-979F-B186D4622BC7@vanderbilt.edu> References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> <6C7203C4-27D4-40E7-979F-B186D4622BC7@vanderbilt.edu> Message-ID: I?m afraid it?s not a typo ? [root at server gpfs]# rpm -qa | grep sernet sernet-samba-ctdb-tests-4.2.9-19.el6.x86_64 sernet-samba-common-4.2.9-19.el6.x86_64 sernet-samba-winbind-4.2.9-19.el6.x86_64 sernet-samba-ad-4.2.9-19.el6.x86_64 sernet-samba-libs-4.2.9-19.el6.x86_64 sernet-samba-4.2.9-19.el6.x86_64 sernet-samba-libsmbclient-devel-4.2.9-19.el6.x86_64 sernet-samba-libsmbclient0-4.2.9-19.el6.x86_64 sernet-samba-ctdb-4.2.9-19.el6.x86_64 sernet-samba-libwbclient-devel-4.2.9-19.el6.x86_64 sernet-samba-client-4.2.9-19.el6.x86_64 sernet-samba-debuginfo-4.2.9-19.el6.x86_64 From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 06 July 2016 16:21 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, Is that a typo in the version? We?re also using Sernet Samba but we?ve got 4.3.9? Kevin On Jul 6, 2016, at 9:54 AM, Sobey, Richard A > wrote: Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" > To: gpfsug main discussion list > Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 6 16:26:36 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 15:26:36 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> <6C7203C4-27D4-40E7-979F-B186D4622BC7@vanderbilt.edu> Message-ID: By the way, we are planning to go to CES / 4.2.x in a matter of weeks, but understanding this problem was quite important for me. Perhaps knowing now that the fix is probably to install a different version of Samba, we?ll probably leave it alone. Thank you everyone for your help, Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 06 July 2016 16:23 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions I?m afraid it?s not a typo ? [root at server gpfs]# rpm -qa | grep sernet sernet-samba-ctdb-tests-4.2.9-19.el6.x86_64 sernet-samba-common-4.2.9-19.el6.x86_64 sernet-samba-winbind-4.2.9-19.el6.x86_64 sernet-samba-ad-4.2.9-19.el6.x86_64 sernet-samba-libs-4.2.9-19.el6.x86_64 sernet-samba-4.2.9-19.el6.x86_64 sernet-samba-libsmbclient-devel-4.2.9-19.el6.x86_64 sernet-samba-libsmbclient0-4.2.9-19.el6.x86_64 sernet-samba-ctdb-4.2.9-19.el6.x86_64 sernet-samba-libwbclient-devel-4.2.9-19.el6.x86_64 sernet-samba-client-4.2.9-19.el6.x86_64 sernet-samba-debuginfo-4.2.9-19.el6.x86_64 From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 06 July 2016 16:21 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, Is that a typo in the version? We?re also using Sernet Samba but we?ve got 4.3.9? Kevin On Jul 6, 2016, at 9:54 AM, Sobey, Richard A > wrote: Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" > To: gpfsug main discussion list > Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hpc.ken.tw25qn at gmail.com Wed Jul 6 16:37:56 2016 From: hpc.ken.tw25qn at gmail.com (Ken Atkinson) Date: Wed, 6 Jul 2016 16:37:56 +0100 Subject: [gpfsug-discuss] =?utf-8?q?vn511i7_/_Windoqws_prevmqq1qqqqqq2qqqq?= =?utf-8?q?qqqqqqqqaqqa=C3=A0a=C3=A5io8iusk_versions?= Message-ID: 9G4HTGTB kk38?vv On 6 Jul 2016 15:46, "Christof Schmitt" wrote: > > The message in the trace confirms that this is triggered by: > https://git.samba.org/?p=samba.git;a=commitdiff;h=4 > > I 2asuspect that the Samba version used misses the patch > https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a > > The CES build of Samba shippied in Spectrum Scale includes the mentioned > patch, and that should avoid the problem seen. Would it be possible to > build Samba again with the mentioned patch to test whether that fixes the > issue seen here? > > Regards, > > Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ > christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) > > > > From: "Sobey, Richard A" > To: gpfsug main discussion list > Date: 07/06/2016 05:23 AM > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Thanks Daniel ? sorry to be dense, but does this indicate working as > intended, or a bug? I assume the former. So, the question still remains > how has this suddenly broken, when: > > [root at server ict]# mmgetacl -k nfs4 .snapshots/ > .snapshots/: Operation not permitted > > ?appears to be the correct output and is consistent with someone else?s > GPFS cluster where it is working. > > Cheers > > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org [ > mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel > Kidger > Sent: 06 July 2016 12:51 > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Looking at recent patches to SAMBA I see from December 2015: > https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch > , > (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which > includes the comment: > Failing that, smbd_check_access_rights should check Unix perms at that > point. > ) > > diff --git a/source3/modules/vfs_shadow_copy2.c > b/source3/modules/vfs_shadow_copy2.c > index fca05cf..07e2f8a 100644 > --- a/source3/modules/vfs_shadow_copy2.c > +++ b/source3/modules/vfs_shadow_copy2.c > @@ -30,6 +30,7 @@ > */ > > #include "includes.h" > +#include "smbd/smbd.h" > #include "system/filesys.h" > #include "include/ntioctl.h" > #include > @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct > *handle, > return NULL; > } > > +static bool check_access_snapdir(struct vfs_handle_struct *handle, > + const char *path) > +{ > + struct smb_filename smb_fname; > + int ret; > + NTSTATUS status; > + > + ZERO_STRUCT(smb_fname); > + smb_fname.base_name = talloc_asprintf(talloc_tos(), > + "%s", > + path); > + if (smb_fname.base_name == NULL) { > + return false; > + } > + > + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); > + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { > + TALLOC_FREE(smb_fname.base_name); > + return false; > + } > + > + status = smbd_check_access_rights(handle->conn, > + &smb_fname, > + false, > + SEC_DIR_LIST); > + if (!NT_STATUS_IS_OK(status)) { > + DEBUG(0,("user does not have list permission " > + "on snapdir %s\n", > + smb_fname.base_name)); > + TALLOC_FREE(smb_fname.base_name); > + return false; > + } > + TALLOC_FREE(smb_fname.base_name); > + return true; > +} > + > > Daniel > > > > > > Dr Daniel Kidger > IBM Technical Sales Specialist > Software Defined Solution Sales > > +44-07818 522 266 > daniel.kidger at uk.ibm.com > > > > > > > ----- Original message ----- > From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > Date: Wed, Jul 6, 2016 10:55 AM > > Sure. It might be easier if I just post the entire smb.conf: > > [global] > netbios name = store > workgroup = IC > security = ads > realm = IC.AC.UK > kerberos method = secrets and keytab > > vfs objects = shadow_copy2 syncops gpfs fileid > ea support = yes > store dos attributes = yes > map readonly = no > map archive = no > map system = no > map hidden = no > unix extensions = no > allocation roundup size = 1048576 > > disable netbios = yes > smb ports = 445 > # server signing = mandatory > > template shell = /bin/bash > interfaces = bond2 lo bond0 > allow trusted domains = no > > printing = bsd > printcap name = /dev/null > load printers = no > disable spoolss = yes > > idmap config IC : default = yes > idmap config IC : cache time = 180 > idmap config IC : backend = ad > idmap config IC : schema_mode = rfc2307 > idmap config IC : range = 500 - 2000000 > idmap config * : range = 3000000 - 3500000 > idmap config * : backend = tdb2 > winbind refresh tickets = yes > winbind nss info = rfc2307 > winbind use default domain = true > winbind offline logon = true > winbind separator = / > winbind enum users = true > winbind enum groups = true > winbind nested groups = yes > winbind expand groups = 2 > > winbind max clients = 10000 > > clustering = yes > ctdbd socket = /tmp/ctdb.socket > gpfs:sharemodes = yes > gpfs:winattr = yes > gpfs:leases = yes > gpfs:dfreequota = yes > # nfs4:mode = special > # nfs4:chown = no > nfs4:chown = yes > nfs4:mode = simple > > nfs4:acedup = merge > fileid:algorithm = fsname > force unknown acl user = yes > > shadow:snapdir = .snapshots > shadow:fixinodes = yes > shadow:snapdirseverywhere = yes > shadow:sort = desc > > syncops:onclose = no > syncops:onmeta = no > kernel oplocks = yes > level2 oplocks = yes > oplocks = yes > notify:inotify = no > wide links = no > async smb echo handler = yes > smbd:backgroundqueue = False > use sendfile = no > dmapi support = yes > > aio write size = 1 > aio read size = 1 > > enable core files = no > > #debug logging > log level = 2 > log file = /var/log/samba.%m > max log size = 1024 > debug timestamp = yes > > [IC] > comment = Unified Group Space Area > path = /gpfs/prd/groupspace/ic > public = no > read only = no > valid users = "@domain users" > > From: gpfsug-discuss-bounces at spectrumscale.org [ > mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans > Sent: 06 July 2016 10:47 > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Can you cut/paste your full VFS options for gpfs and shadow copy from > smb.conf? > > On 06/07/2016 10:37, Sobey, Richard A wrote: > Quick followup on this. Doing some more samba debugging (i.e. increasing > log levels!) and come up with the following: > > [2016/07/06 10:07:35.602080, 3] > ../source3/smbd/vfs.c:1322(check_reduced_name) > check_reduced_name: > admin/ict/serviceoperations/slough_project/Slough_Layout reduced to > /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout > [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) > unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) > returning 0644 > [2016/07/06 10:07:35.613374, 0] > ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) > user does not have list permission on snapdir > /gpfs/prd/groupspace/ic/admin/ict/.snapshots > [2016/07/06 10:07:35.613416, 0] > ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) > access denied on listing snapdir > /gpfs/prd/groupspace/ic/admin/ict/.snapshots > [2016/07/06 10:07:35.613434, 0] > ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) > FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed > - NT_STATUS_ACCESS_DENIED. > [2016/07/06 10:07:47.648557, 3] > ../source3/smbd/service.c:1138(close_cnum) > 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service > IPC$ > > Any takers? I cannot run mmgetacl on the .snapshots folder at all, as > root. A snapshot I just created to make sure I had full control on the > folder: (39367 is me, I didn?t run this command on a CTDB node so the UID > mapping isn?t working). > > [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 > #NFSv4 ACL > #owner:root > #group:root > group:74036:r-x-:allow:FileInherit:DirInherit:Inherited > (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL > (X)READ_ATTR (X)READ_NAMED > (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL > (-)WRITE_ATTR (-)WRITE_NAMED > > user:39367:rwxc:allow:FileInherit:DirInherit:Inherited > (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL > (X)READ_ATTR (X)READ_NAMED > (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL > (X)WRITE_ATTR (X)WRITE_NAMED > > From: gpfsug-discuss-bounces at spectrumscale.org [ > mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, > Richard A > Sent: 20 June 2016 16:03 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our > customers have come to like previous versions and indeed it is sort of a > selling point for us. > > Samba is the only thing we?ve changed recently after the badlock debacle > so I?m tempted to blame that, but who knows. > > If (when) I find out I?ll let everyone know. > > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org [ > mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, > Kevin L > Sent: 20 June 2016 15:56 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Hi Richard, > > I can?t answer your question but I can tell you that we have experienced > either the exact same thing you are or something very similar. It > occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists > even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. > > And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* > upgrade SAMBA versions at that time. Therefore, I believe that something > changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA > may have been relying on a bugundocumented feature > in GPFS that IBM fixed for all I know, and I?m obviously speculating here. > > The problem we see is that the .snapshots directory in each folder can be > cd?d to but is empty. The snapshots are all there, however, if you: > > cd //.snapshots/ taken>/rest/of/path/to/folder/in/question > > This obviously prevents users from being able to do their own recovery of > files unless you do something like what you describe, which we are > unwilling to do for security reasons. We have a ticket open with DDN? > > Kevin > > On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: > > Hi all > > Can someone clarify if the ability for Windows to view snapshots as > Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my > users cannot restore files from snapshots over a CIFS share, where should > I be looking? > > I don?t know when this problem occurred, but within the last few weeks > certainly our users with full control over their data now see no previous > versions available, but if we export their fileset and set ?force user = > root? all the snapshots are available. > > I think the answer is SAMBA, right? We?re running GPFS 3.5 and > sernet-samba 4.2.9. > > Many thanks > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Barry Evans > Technical Director & Co-Founder > Pixit Media > Mobile: +44 (0)7950 666 248 > http://www.pixitmedia.com > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jul 6 17:19:40 2016 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 6 Jul 2016 09:19:40 -0700 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: The first patch is at least in Samba 4.2 and newer. The patch to the vfs_gpfs module is only in Samba 4.3 and newer. So any of these should fix your problem: - Add the vfs_gpfs patch to the source code of Samba 4.2.9 and recompile the code. - Upgrade to Sernet Samba 4.3.x or newer - Change the Samba services to the ones provided through CES Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 07:54 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Mark.Bush at siriuscom.com Thu Jul 7 14:00:17 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 7 Jul 2016 13:00:17 +0000 Subject: [gpfsug-discuss] Migration policy confusion Message-ID: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Thu Jul 7 14:10:52 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Thu, 7 Jul 2016 13:10:52 +0000 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Jul 7 14:12:12 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 7 Jul 2016 15:12:12 +0200 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 7 14:16:19 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 7 Jul 2016 13:16:19 +0000 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: <640419CE-E989-47CD-999D-65EC249C9B8A@siriuscom.com> Olaf, thanks. Yes the plan is to have SSD?s for the system pool ultimately but this is just a test system that I?m using to try and understand teiring better. The files (10 or so of them) are each 200MB in size. Mark From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 8:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Migration policy confusion HI , first of all, given by the fact, that the MetaData is stored in system pool .. system should be the "fastest" pool / underlaying disks ... you have.. with a "slow" access to the MD, access to data is very likely affected.. (except for cached data, where MD is cached) in addition.. tell us, how "big" your test files are ? .. you moved by mmapplypolicy Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/07/2016 03:00 PM Subject: [gpfsug-discuss] Migration policy confusion Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 7 14:16:53 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 7 Jul 2016 13:16:53 +0000 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: <662B1BCF-4746-4825-A55F-A5B9361D60C4@siriuscom.com> Thanks Daniel. I did wait for 15-20 minutes after. From: on behalf of Daniel Kidger Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 8:10 AM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Migration policy confusion Mark, For performance reasons, mmdf gets its data updated asynchronously. Did you try waiting a few minutes? Daniel Error! Filename not specified. Error! Filename not specified. Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Migration policy confusion Date: Thu, Jul 7, 2016 2:00 PM Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From service at metamodul.com Thu Jul 7 14:18:41 2016 From: service at metamodul.com (- -) Date: Thu, 7 Jul 2016 15:18:41 +0200 (CEST) Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: <318576846.22999.a23b5e71-bef0-4fc7-9542-12ecb401ec9e.open-xchange@email.1und1.de> An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 7 15:20:12 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 7 Jul 2016 10:20:12 -0400 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: <662B1BCF-4746-4825-A55F-A5B9361D60C4@siriuscom.com> References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> <662B1BCF-4746-4825-A55F-A5B9361D60C4@siriuscom.com> Message-ID: At the very least, LOOK at the messages output by the mmapplypolicy command at the beginning and end. The "occupancy" stats for each pool are shown BEFORE and AFTER the command does its work. In even more detail, it shows you how many files and how many KB of data were (or will be or would be) migrated. Also, options matter. ReadTheFineManuals. -I test vs -I defer vs -I yes. To see exactly which files are being migrated, use -L 2 To see exactly which files are being selected by your rule(s), use -L 3 And for more details about the files being skipped over, etc, etc, -L 6 Gee, I just checked the doc myself, I forgot some of the details and it's pretty good. Admittedly mmapplypolicy is a complex command. You can do somethings simply, only knowing a few options and policy rules, BUT... As my father used to say, "When all else fails, read the directions!" -L n Controls the level of information displayed by the mmapplypolicy command. Larger values indicate the display of more detailed information. These terms are used: candidate file A file that matches a MIGRATE, DELETE, or LIST policy rule. chosen file A candidate file that has been scheduled for action. These are the valid values for n: 0 Displays only serious errors. 1 Displays some information as the command runs, but not for each file. This is the default. 2 Displays each chosen file and the scheduled migration or deletion action. 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. For examples and more information on this flag, see the section: The mmapplypolicy -L command in the IBM Spectrum Scale: Problem Determination Guide. --marc From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/07/2016 09:17 AM Subject: Re: [gpfsug-discuss] Migration policy confusion Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel. I did wait for 15-20 minutes after. From: on behalf of Daniel Kidger Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 8:10 AM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Migration policy confusion Mark, For performance reasons, mmdf gets its data updated asynchronously. Did you try waiting a few minutes? Daniel Error! Filename not specified. Error! Filename not specified. Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Migration policy confusion Date: Thu, Jul 7, 2016 2:00 PM Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 7 15:30:33 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 7 Jul 2016 14:30:33 +0000 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> <662B1BCF-4746-4825-A55F-A5B9361D60C4@siriuscom.com> Message-ID: <877D722D-8CF5-496F-AAE5-7C0190E54D50@siriuscom.com> Thanks all. I realized that my file creation command was building 200k size files instead of the 200MB files. I fixed that and now I see the mmapplypolicy command take a bit more time and show accurate data as well as my bytes are now on the proper NSDs. It?s always some little thing that the human messes up isn?t it? ? From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 9:20 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Migration policy confusion At the very least, LOOK at the messages output by the mmapplypolicy command at the beginning and end. The "occupancy" stats for each pool are shown BEFORE and AFTER the command does its work. In even more detail, it shows you how many files and how many KB of data were (or will be or would be) migrated. Also, options matter. ReadTheFineManuals. -I test vs -I defer vs -I yes. To see exactly which files are being migrated, use -L 2 To see exactly which files are being selected by your rule(s), use -L 3 And for more details about the files being skipped over, etc, etc, -L 6 Gee, I just checked the doc myself, I forgot some of the details and it's pretty good. Admittedly mmapplypolicy is a complex command. You can do somethings simply, only knowing a few options and policy rules, BUT... As my father used to say, "When all else fails, read the directions!" -L n Controls the level of information displayed by the mmapplypolicy command. Larger values indicate the display of more detailed information. These terms are used: candidate file A file that matches a MIGRATE, DELETE, or LIST policy rule. chosen file A candidate file that has been scheduled for action. These are the valid values for n: 0 Displays only serious errors. 1 Displays some information as the command runs, but not for each file. This is the default. 2 Displays each chosen file and the scheduled migration or deletion action. 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. For examples and more information on this flag, see the section: The mmapplypolicy -L command in the IBM Spectrum Scale: Problem Determination Guide. --marc From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/07/2016 09:17 AM Subject: Re: [gpfsug-discuss] Migration policy confusion Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Thanks Daniel. I did wait for 15-20 minutes after. From: on behalf of Daniel Kidger Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 8:10 AM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Migration policy confusion Mark, For performance reasons, mmdf gets its data updated asynchronously. Did you try waiting a few minutes? Daniel Error! Filename not specified. Error! Filename not specified. Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Migration policy confusion Date: Thu, Jul 7, 2016 2:00 PM Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Thu Jul 7 20:44:15 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Thu, 7 Jul 2016 15:44:15 -0400 Subject: [gpfsug-discuss] Introductions Message-ID: All, My name is Brian Marshall; I am a computational scientist at Virginia Tech. We have ~2PB GPFS install we are about to expand this Summer and I may have some questions along the way. Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Jul 8 03:09:30 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Jul 2016 22:09:30 -0400 Subject: [gpfsug-discuss] mmpmon gfis fields question Message-ID: <223e7955-1831-6ff5-40e6-bd534f4c2e2f@nasa.gov> Does anyone know what the fields in the mmpmon gfis output indicate? # socat /var/mmfs/mmpmon/mmpmonSocket - _event_ newconnection _t_ 1467937547 _tu_ 372882 _n_ 10.101.11.1 _node_ local_node mmpmon gfis _response_ begin mmpmon gfis _mmpmon::gfis_ _n_ 10.101.11.1 _nn_ lorej001 _rc_ 0 _t_ 1467937550 _tu_ 518265 _cl_ disguise-gpfs _fs_ thome _d_ 5 _br_ 0 _bw_ 0 _c_ 0 _r_ 0 _w_ 0 _oc_ 0 _cc_ 0 _rdc_ 0 _wc_ 0 _dir_ 0 _iu_ 0 _irc_ Here's my best guess: _d_ number of disks in the filesystem _br_ bytes read from disk _bw_ bytes written to disk _c_ cache ops _r_ read ops _w_ write ops _oc_ open() calls _cc_ close() calls _rdc_ read() calls _wc_ write() calls _dir_ readdir calls _iu_ inode update count _irc_ inode read count _idc_ inode delete count _icc_ inode create count _bc_ bytes read from cache _sch_ stat cache hits _scm_ stat cache misses This is all because the mmpmon fs_io_s command doesn't give me a way that I can find to distinguish block/stat cache hits from cache misses which makes it harder to pinpoint misbehaving applications on the system. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Fri Jul 8 03:16:19 2016 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 7 Jul 2016 19:16:19 -0700 Subject: [gpfsug-discuss] mmpmon gfis fields question In-Reply-To: <223e7955-1831-6ff5-40e6-bd534f4c2e2f@nasa.gov> References: <223e7955-1831-6ff5-40e6-bd534f4c2e2f@nasa.gov> Message-ID: Hi, this is a undocumented mmpmon call, so you are on your own, but here is the correct description : _n_ IP address of the node responding. This is the address by which GPFS knows the node. _nn_ The name by which GPFS knows the node. _rc_ The reason/error code. In this case, the reply value is 0 (OK). _t_ Current time of day in seconds (absolute seconds since Epoch (1970)). _tu_ Microseconds part of the current time of day. _cl_ The name of the cluster that owns the file system. _fs_ The name of the file system for which data are being presented. _d_ The number of disks in the file system. _br_ Total number of bytes read from disk (not counting those read from cache.) _bw_ Total number of bytes written, to both disk and cache. _c_ The total number of read operations supplied from cache. _r_ The total number of read operations supplied from disk. _w_ The total number of write operations, to both disk and cache. _oc_ Count of open() call requests serviced by GPFS. _cc_ Number of close() call requests serviced by GPFS. _rdc_ Number of application read requests serviced by GPFS. _wc_ Number of application write requests serviced by GPFS. _dir_ Number of readdir() call requests serviced by GPFS. _iu_ Number of inode updates to disk. _irc_ Number of inode reads. _idc_ Number of inode deletions. _icc_ Number of inode creations. _bc_ Number of bytes read from the cache. _sch_ Number of stat cache hits. _scm_ Number of stat cache misses. On Thu, Jul 7, 2016 at 7:09 PM, Aaron Knister wrote: > Does anyone know what the fields in the mmpmon gfis output indicate? > > # socat /var/mmfs/mmpmon/mmpmonSocket - > _event_ newconnection _t_ 1467937547 _tu_ 372882 _n_ 10.101.11.1 _node_ > local_node > mmpmon gfis > _response_ begin mmpmon gfis > _mmpmon::gfis_ _n_ 10.101.11.1 _nn_ lorej001 _rc_ 0 _t_ 1467937550 _tu_ > 518265 _cl_ disguise-gpfs _fs_ thome _d_ 5 _br_ 0 _bw_ 0 _c_ 0 _r_ 0 _w_ 0 > _oc_ 0 _cc_ 0 _rdc_ 0 _wc_ 0 _dir_ 0 _iu_ 0 _irc_ > > > Here's my best guess: > > _d_ number of disks in the filesystem > _br_ bytes read from disk > _bw_ bytes written to disk > _c_ cache ops > _r_ read ops > _w_ write ops > _oc_ open() calls > _cc_ close() calls > _rdc_ read() calls > _wc_ write() calls > _dir_ readdir calls > _iu_ inode update count > _irc_ inode read count > _idc_ inode delete count > _icc_ inode create count > _bc_ bytes read from cache > _sch_ stat cache hits > _scm_ stat cache misses > > This is all because the mmpmon fs_io_s command doesn't give me a way that > I can find to distinguish block/stat cache hits from cache misses which > makes it harder to pinpoint misbehaving applications on the system. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Jul 8 04:13:59 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Jul 2016 23:13:59 -0400 Subject: [gpfsug-discuss] mmpmon gfis fields question In-Reply-To: References: <223e7955-1831-6ff5-40e6-bd534f4c2e2f@nasa.gov> Message-ID: Ah, thank you! That's a huge help. My preference, of course, would be to use documented calls but I'm already down that rabbit hole calling nsd_ds directly b/c the snmp agent chokes and dies a horrible death with 3.5k nodes and the number of NSDs we have. On 7/7/16 10:16 PM, Sven Oehme wrote: > Hi, > > this is a undocumented mmpmon call, so you are on your own, but here is > the correct description : > > > _n_ > > > > IP address of the node responding. This is the address by which GPFS > knows the node. > > _nn_ > > > > The name by which GPFS knows the node. > > _rc_ > > > > The reason/error code. In this case, the reply value is 0 (OK). > > _t_ > > > > Current time of day in seconds (absolute seconds since Epoch (1970)). > > _tu_ > > > > Microseconds part of the current time of day. > > _cl_ > > > > The name of the cluster that owns the file system. > > _fs_ > > > > The name of the file system for which data are being presented. > > _d_ > > > > The number of disks in the file system. > > _br_ > > > > Total number of bytes read from disk (not counting those read from cache.) > > _bw_ > > > > Total number of bytes written, to both disk and cache. > > _c_ > > > > The total number of read operations supplied from cache. > > _r_ > > > > The total number of read operations supplied from disk. > > _w_ > > > > The total number of write operations, to both disk and cache. > > _oc_ > > > > Count of open() call requests serviced by GPFS. > > _cc_ > > > > Number of close() call requests serviced by GPFS. > > _rdc_ > > > > Number of application read requests serviced by GPFS. > > _wc_ > > > > Number of application write requests serviced by GPFS. > > _dir_ > > > > Number of readdir() call requests serviced by GPFS. > > _iu_ > > > > Number of inode updates to disk. > > _irc_ > > > > Number of inode reads. > > _idc_ > > > > Number of inode deletions. > > _icc_ > > > > Number of inode creations. > > _bc_ > > > > Number of bytes read from the cache. > > _sch_ > > > > Number of stat cache hits. > > _scm_ > > > > Number of stat cache misses. > > > On Thu, Jul 7, 2016 at 7:09 PM, Aaron Knister > wrote: > > Does anyone know what the fields in the mmpmon gfis output indicate? > > # socat /var/mmfs/mmpmon/mmpmonSocket - > _event_ newconnection _t_ 1467937547 _tu_ 372882 _n_ 10.101.11.1 > _node_ local_node > mmpmon gfis > _response_ begin mmpmon gfis > _mmpmon::gfis_ _n_ 10.101.11.1 _nn_ lorej001 _rc_ 0 _t_ 1467937550 > _tu_ 518265 _cl_ disguise-gpfs _fs_ thome _d_ 5 _br_ 0 _bw_ 0 _c_ 0 > _r_ 0 _w_ 0 _oc_ 0 _cc_ 0 _rdc_ 0 _wc_ 0 _dir_ 0 _iu_ 0 _irc_ > > > Here's my best guess: > > _d_ number of disks in the filesystem > _br_ bytes read from disk > _bw_ bytes written to disk > _c_ cache ops > _r_ read ops > _w_ write ops > _oc_ open() calls > _cc_ close() calls > _rdc_ read() calls > _wc_ write() calls > _dir_ readdir calls > _iu_ inode update count > _irc_ inode read count > _idc_ inode delete count > _icc_ inode create count > _bc_ bytes read from cache > _sch_ stat cache hits > _scm_ stat cache misses > > This is all because the mmpmon fs_io_s command doesn't give me a way > that I can find to distinguish block/stat cache hits from cache > misses which makes it harder to pinpoint misbehaving applications on > the system. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From mweil at wustl.edu Mon Jul 11 17:33:02 2016 From: mweil at wustl.edu (Matt Weil) Date: Mon, 11 Jul 2016 11:33:02 -0500 Subject: [gpfsug-discuss] CES sizing guide In-Reply-To: <375ba33c-894f-215f-4044-e4995761f640@wustl.edu> References: <375ba33c-894f-215f-4044-e4995761f640@wustl.edu> Message-ID: Hello all, > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Sizing%20Guidance%20for%20Protocol%20Node > > Is there any more guidance on this as one socket can be a lot of cores and memory today. > > Thanks > ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From mimarsh2 at vt.edu Tue Jul 12 14:12:17 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 12 Jul 2016 09:12:17 -0400 Subject: [gpfsug-discuss] Aggregating filesystem performance Message-ID: All, I have a Spectrum Scale 4.1 cluster serving data to 4 different client clusters (~800 client nodes total). I am looking for ways to monitor filesystem performance to uncover network bottlenecks or job usage patterns affecting performance. I received this info below from an IBM person. Does anyone have examples of aggregating mmperfmon data? Is anyone doing something different? "mmpmon does not currently aggregate cluster-wide data. As of SS 4.1.x you can look at "mmperfmon query" as well, but it also primarily only provides node specific data. The tools are built to script performance data but there aren't any current scripts available for you to use within SS (except for what might be on the SS wiki page). It would likely be something you guys would need to build, that's what other clients have done." Thank you, Brian Marshall Virginia Tech - Advanced Research Computing -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Jul 12 14:19:49 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 12 Jul 2016 13:19:49 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance Message-ID: <39E581EB-978D-4103-A2BC-FE4FF57B3608@nuance.com> Hi Brian I have a couple of pointers: - We have been running mmpmon for a while now across multiple clusters, sticking the data in external database for analysis. This has been working pretty well, but we are transitioning to (below) - SS 4.1 and later have built in zimon for collecting a wealth of performance data - this feeds into the built in GUI. But, there is bridge tools that IBM has built internally and keeps promising to release (I talked about it at the last SS user group meeting at Argonne) that allows use of Grafana with the zimon data. This is working well for us. Let me know if you want to discuss details and I will be happy to share my experiences and pointers in looking at the performance data. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Tuesday, July 12, 2016 at 9:12 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Aggregating filesystem performance All, I have a Spectrum Scale 4.1 cluster serving data to 4 different client clusters (~800 client nodes total). I am looking for ways to monitor filesystem performance to uncover network bottlenecks or job usage patterns affecting performance. I received this info below from an IBM person. Does anyone have examples of aggregating mmperfmon data? Is anyone doing something different? "mmpmon does not currently aggregate cluster-wide data. As of SS 4.1.x you can look at "mmperfmon query" as well, but it also primarily only provides node specific data. The tools are built to script performance data but there aren't any current scripts available for you to use within SS (except for what might be on the SS wiki page). It would likely be something you guys would need to build, that's what other clients have done." Thank you, Brian Marshall Virginia Tech - Advanced Research Computing -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Jul 12 14:23:12 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 12 Jul 2016 15:23:12 +0200 Subject: [gpfsug-discuss] Aggregating filesystem performance In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From ckrafft at de.ibm.com Wed Jul 13 09:49:00 2016 From: ckrafft at de.ibm.com (Christoph Krafft) Date: Wed, 13 Jul 2016 10:49:00 +0200 Subject: [gpfsug-discuss] GPFS / Spectrum Scale is now officially certified with SAP HANA on IBM Power Systrems Message-ID: Hi GPFS / Spectrum Scale "addicts", for all those using GPFS / Spectrum Scale "commercially" - IBM has certified it yesterday with SAP and it is NOW officially supported with HANA on IBM Power Systems. Please see the following SAP Note concerning the details. 2055470 - HANA on POWER Planning and Installation Specifics - Central Note: (See attached file: 2055470.pdf) Mit freundlichen Gr??en / Sincerely Christoph Krafft Client Technical Specialist - Power Systems, IBM Systems Certified IT Specialist @ The Open Group Phone: +49 (0) 7034 643 2171 IBM Deutschland GmbH Mobile: +49 (0) 160 97 81 86 12 Hechtsheimer Str. 2 Email: ckrafft at de.ibm.com 55131 Mainz Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 52106945.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 2055470.pdf Type: application/pdf Size: 101863 bytes Desc: not available URL: From mimarsh2 at vt.edu Wed Jul 13 14:43:43 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Wed, 13 Jul 2016 09:43:43 -0400 Subject: [gpfsug-discuss] Aggregating filesystem performance (Oesterlin, Robert) Message-ID: Robert, 1) Do you see any noticeable performance impact by running the performance monitoring? 2) Can you share the zimon configuration that you use? i.e. what metrics do you find most useful? Thank you, Brian Marshall -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 13 14:59:20 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 13 Jul 2016 13:59:20 +0000 Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Message-ID: Hi all Where can I find documentation on how to prepare RHEL 7.2 for an installation of SS 4.2 which will be a CES server? Is a "minimal" install (yes, install using the GUI, don't shoot me) sufficient or should I choose a different canned option. Thanks Richard PS I tried looking in the FAQ http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html but I couldn't find anything specific to this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 13 17:06:14 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 13 Jul 2016 16:06:14 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance Message-ID: Hi Brian I haven't seen any problems at all with the monitoring. (impacting performance). As for the Zimon metrics - let me assemble that and maybe discuss indetail off the mailing list (I've BCC'd you on this posting. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Wednesday, July 13, 2016 at 9:43 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] Aggregating filesystem performance (Oesterlin, Robert) Robert, 1) Do you see any noticeable performance impact by running the performance monitoring? 2) Can you share the zimon configuration that you use? i.e. what metrics do you find most useful? Thank you, Brian Marshall -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jul 13 17:08:32 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 13 Jul 2016 16:08:32 +0000 Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts In-Reply-To: References: Message-ID: The spectrumscale-protocols rpm (I think that was it) should include all the os dependencies you need for the various ss bits. If you were adding the ss rpms by hand, then there are packages you need to include. Unfortunately the protocols rpm adds all the protocols whether you want them or not from what I remember. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Sobey, Richard A [r.sobey at imperial.ac.uk] Sent: 13 July 2016 14:59 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Hi all Where can I find documentation on how to prepare RHEL 7.2 for an installation of SS 4.2 which will be a CES server? Is a ?minimal? install (yes, install using the GUI, don?t shoot me) sufficient or should I choose a different canned option. Thanks Richard PS I tried looking in the FAQ http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html but I couldn?t find anything specific to this. From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jul 13 17:18:18 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 13 Jul 2016 16:18:18 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance In-Reply-To: References: Message-ID: <1AC5C6E8-98A8-493B-ABB8-53237BAC23F5@Vanderbilt.Edu> Hi Bob, I am also in the process of setting up monitoring under GPFS (and it will always be GPFS) 4.2 on our test cluster right now and would also be interested in the experiences of others more experienced and knowledgeable than myself. Would you considering posting to the list? Or is there sensitive information that you don?t want to share on the list? Thanks? Kevin On Jul 13, 2016, at 11:06 AM, Oesterlin, Robert > wrote: Hi Brian I haven't seen any problems at all with the monitoring. (impacting performance). As for the Zimon metrics - let me assemble that and maybe discuss indetail off the mailing list (I've BCC'd you on this posting. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Brian Marshall > Reply-To: gpfsug main discussion list > Date: Wednesday, July 13, 2016 at 9:43 AM To: "gpfsug-discuss at spectrumscale.org" > Subject: [EXTERNAL] Re: [gpfsug-discuss] Aggregating filesystem performance (Oesterlin, Robert) Robert, 1) Do you see any noticeable performance impact by running the performance monitoring? 2) Can you share the zimon configuration that you use? i.e. what metrics do you find most useful? Thank you, Brian Marshall _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 13 17:29:08 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 13 Jul 2016 16:29:08 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance In-Reply-To: <1AC5C6E8-98A8-493B-ABB8-53237BAC23F5@Vanderbilt.Edu> References: <1AC5C6E8-98A8-493B-ABB8-53237BAC23F5@Vanderbilt.Edu> Message-ID: <90988968-C133-4965-9A91-13AE1DB8C670@nuance.com> Sure, will do. Nothing sensitive here, just a fairly complex discussion for a mailing list! We'll see - give me a day or so. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Wednesday, July 13, 2016 at 12:18 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Aggregating filesystem performance Hi Bob, I am also in the process of setting up monitoring under GPFS (and it will always be GPFS) 4.2 on our test cluster right now and would also be interested in the experiences of others more experienced and knowledgeable than myself. Would you considering posting to the list? Or is there sensitive information that you don?t want to share on the list? Thanks? Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jul 13 18:09:24 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 13 Jul 2016 17:09:24 +0000 Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts In-Reply-To: References: Message-ID: The gpfs.protocols package drags in all the openstack swift dependencies (lots of packages). I normally don't want the object support, so just install the nfs-ganesha, samba and zimon packages (plus rsync and python-ldap which I've figured out are needed). But, please beware that rhel7.2 isn't supported with v4.2.0 CES, and I've seen kernel crashes triggered by samba when ignoring that.. -jf ons. 13. jul. 2016 kl. 18.08 skrev Simon Thompson (Research Computing - IT Services) : > > The spectrumscale-protocols rpm (I think that was it) should include all > the os dependencies you need for the various ss bits. > > If you were adding the ss rpms by hand, then there are packages you need > to include. Unfortunately the protocols rpm adds all the protocols whether > you want them or not from what I remember. > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [ > gpfsug-discuss-bounces at spectrumscale.org] on behalf of Sobey, Richard A [ > r.sobey at imperial.ac.uk] > Sent: 13 July 2016 14:59 > To: 'gpfsug-discuss at spectrumscale.org' > Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts > > Hi all > > Where can I find documentation on how to prepare RHEL 7.2 for an > installation of SS 4.2 which will be a CES server? Is a ?minimal? install > (yes, install using the GUI, don?t shoot me) sufficient or should I choose > a different canned option. > > Thanks > > Richard > PS I tried looking in the FAQ > http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html > but I couldn?t find anything specific to this. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu Jul 14 08:55:33 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 14 Jul 2016 07:55:33 +0000 Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts In-Reply-To: References: Message-ID: Aha. I naively thought it would be. It?s no problem to use 7.1. Thanks for the heads up, and the responses. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jan-Frode Myklebust Sent: 13 July 2016 18:09 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts The gpfs.protocols package drags in all the openstack swift dependencies (lots of packages). I normally don't want the object support, so just install the nfs-ganesha, samba and zimon packages (plus rsync and python-ldap which I've figured out are needed). But, please beware that rhel7.2 isn't supported with v4.2.0 CES, and I've seen kernel crashes triggered by samba when ignoring that.. -jf ons. 13. jul. 2016 kl. 18.08 skrev Simon Thompson (Research Computing - IT Services) >: The spectrumscale-protocols rpm (I think that was it) should include all the os dependencies you need for the various ss bits. If you were adding the ss rpms by hand, then there are packages you need to include. Unfortunately the protocols rpm adds all the protocols whether you want them or not from what I remember. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Sobey, Richard A [r.sobey at imperial.ac.uk] Sent: 13 July 2016 14:59 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Hi all Where can I find documentation on how to prepare RHEL 7.2 for an installation of SS 4.2 which will be a CES server? Is a ?minimal? install (yes, install using the GUI, don?t shoot me) sufficient or should I choose a different canned option. Thanks Richard PS I tried looking in the FAQ http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html but I couldn?t find anything specific to this. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylorm at us.ibm.com Fri Jul 15 18:41:27 2016 From: taylorm at us.ibm.com (Michael L Taylor) Date: Fri, 15 Jul 2016 10:41:27 -0700 Subject: [gpfsug-discuss] gpfsug-discuss Sample RHEL 7.2 config Spectrum Scale install toolkit In-Reply-To: References: Message-ID: Hi Richard, The Knowledge Center should help guide you to prepare a RHEL7 node for installation with the /usr/lpp/mmfs/4.2.0.x/installer/spectrumscale install toolkit being a good way to install CES and all of its prerequisites: http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_loosein.htm For a high level quick overview of the install toolkit: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Protocols%20Quick%20Overview%20for%20IBM%20Spectrum%20Scale As mentioned, RHEL7.2 will be supported with CES with the 4.2.1 release due out shortly.... RHEL7.1 on 4.2 will work. Today's Topics: 1. Re: Aggregating filesystem performance (Oesterlin, Robert) (Brian Marshall) 2. Sample RHEL 7.2 config / anaconda scripts (Sobey, Richard A) 3. Re: Aggregating filesystem performance (Oesterlin, Robert) 4. Re: Sample RHEL 7.2 config / anaconda scripts (Simon Thompson (Research Computing - IT Services)) 5. Re: Aggregating filesystem performance (Buterbaugh, Kevin L) Message: 4 Date: Wed, 13 Jul 2016 16:08:32 +0000 From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Message-ID: Content-Type: text/plain; charset="Windows-1252" The spectrumscale-protocols rpm (I think that was it) should include all the os dependencies you need for the various ss bits. If you were adding the ss rpms by hand, then there are packages you need to include. Unfortunately the protocols rpm adds all the protocols whether you want them or not from what I remember. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Sobey, Richard A [r.sobey at imperial.ac.uk] Sent: 13 July 2016 14:59 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Hi all Where can I find documentation on how to prepare RHEL 7.2 for an installation of SS 4.2 which will be a CES server? Is a ?minimal? install (yes, install using the GUI, don?t shoot me) sufficient or should I choose a different canned option. Thanks Richard PS I tried looking in the FAQ http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html but I couldn?t find anything specific to this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Sun Jul 17 02:04:39 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Sat, 16 Jul 2016 21:04:39 -0400 Subject: [gpfsug-discuss] segment size and sub-block size Message-ID: All, When picking blockSize and segmentSize on RAID6 8+2 LUNs, I have see 2 optimal theories. 1) Make blockSize = # Data Disks * segmentSize e.g. in the RAID6 8+2 case, 8 MB blockSize = 8 * 1 MB segmentSize This makes sense to me as every GPFS block write is a full stripe write 2) Make blockSize = 32 (number sub blocks) * segmentSize; also make sure the blockSize is a multiple of #data disks * segmentSize I don't know enough about GPFS to know how subblocks interact and what tradeoffs this makes. Can someone explain (or point to a doc) about sub block mechanics and when to optimize for that? Thank you, Brian Marshall -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sun Jul 17 02:20:31 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sat, 16 Jul 2016 21:20:31 -0400 Subject: [gpfsug-discuss] segment size and sub-block size In-Reply-To: References: Message-ID: <9287130c-70ba-207c-221d-f236bad8acaf@nasa.gov> Hi Brian, We use a 128KB segment size on our DDNs and a 1MB block size and it works quite well for us (throughput in the 10's of gigabytes per second). IIRC the sub block (blockSize/32) is the smallest unit of allocatable disk space. If that's not tuned well to your workload you can end up with a lot of wasted space on the filesystem. In option #1, the smallest unit of allocatable space is 256KB. If you have millions of files that are say 8K in size you can do the math on lost space. In option #2, if you're using the same 1MB segment size from the option 1 scenario it gets even worse. Hope that helps. This might also help (https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_frags.htm). -Aaron On 7/16/16 9:04 PM, Brian Marshall wrote: > All, > > When picking blockSize and segmentSize on RAID6 8+2 LUNs, I have see 2 > optimal theories. > > > 1) Make blockSize = # Data Disks * segmentSize > e.g. in the RAID6 8+2 case, 8 MB blockSize = 8 * 1 MB segmentSize > > This makes sense to me as every GPFS block write is a full stripe write > > 2) Make blockSize = 32 (number sub blocks) * segmentSize; also make sure > the blockSize is a multiple of #data disks * segmentSize > > I don't know enough about GPFS to know how subblocks interact and what > tradeoffs this makes. > > Can someone explain (or point to a doc) about sub block mechanics and > when to optimize for that? > > Thank you, > Brian Marshall > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From mimarsh2 at vt.edu Sun Jul 17 03:56:14 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Sat, 16 Jul 2016 22:56:14 -0400 Subject: [gpfsug-discuss] SSD LUN setup Message-ID: When setting up SSDs to be used as a fast tier storage pool, are people still doing RAID6 LUNs? I think write endurance is good enough now that this is no longer a big concern (maybe a small concern). I could be wrong. I have read about other products doing RAID1 with deduplication and compression to take less than the 50% capacity hit. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Sun Jul 17 14:05:35 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Sun, 17 Jul 2016 13:05:35 +0000 Subject: [gpfsug-discuss] SSD LUN setup Message-ID: <238458C4-E9C7-4065-9716-C3B38E52445D@nuance.com> Thinly provisioned (compressed) metadata volumes is unsupported according to IBM. See the GPFS FAQ here, question 4.12: "Placing GPFS metadata on an NSD backed by a thinly provisioned volume is dangerous and unsupported." http://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Saturday, July 16, 2016 at 9:56 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] SSD LUN setup I have read about other products doing RAID1 with deduplication and compression to take less than the 50% capacity hit. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Sun Jul 17 15:21:13 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Sun, 17 Jul 2016 10:21:13 -0400 Subject: [gpfsug-discuss] SSD LUN setup In-Reply-To: <238458C4-E9C7-4065-9716-C3B38E52445D@nuance.com> References: <238458C4-E9C7-4065-9716-C3B38E52445D@nuance.com> Message-ID: That's very good advice. In my specific case, I am looking at lowlevel setup of the NSDs in a SSD storage pool with metadata stored elsewhere (on another SSD system). I am wondering if stuff like SSD pagepool size comes into play or if I just look at the segment size from the storage enclosure RAID controller. It sounds like SSDs should be used just like HDDs: group them into RAID6 LUNs. Write endurance is good enough now that longevity is not a problem and there are plenty of IOPs to do parity work. Does this sound right? Anyone doing anything else? Brian On Sun, Jul 17, 2016 at 9:05 AM, Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > Thinly provisioned (compressed) metadata volumes is unsupported according > to IBM. See the GPFS FAQ here, question 4.12: > > > > "Placing GPFS metadata on an NSD backed by a thinly provisioned volume is > dangerous and unsupported." > > > > http://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html > > > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > 507-269-0413 > > > > > > *From: * on behalf of Brian > Marshall > *Reply-To: *gpfsug main discussion list > *Date: *Saturday, July 16, 2016 at 9:56 PM > *To: *"gpfsug-discuss at spectrumscale.org" > > *Subject: *[EXTERNAL] [gpfsug-discuss] SSD LUN setup > > > > I have read about other products doing RAID1 with deduplication and > compression to take less than the 50% capacity hit. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Sun Jul 17 22:49:53 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Sun, 17 Jul 2016 22:49:53 +0100 Subject: [gpfsug-discuss] SSD LUN setup In-Reply-To: References: Message-ID: On 17/07/16 03:56, Brian Marshall wrote: > When setting up SSDs to be used as a fast tier storage pool, are people > still doing RAID6 LUNs? I think write endurance is good enough now that > this is no longer a big concern (maybe a small concern). I could be wrong. > > I have read about other products doing RAID1 with deduplication and > compression to take less than the 50% capacity hit. > There are plenty of ways in which an SSD can fail that does not involve problems with write endurance. The idea of using any disks in anything other than a test/dev GPFS file system that you simply don't care about if it goes belly up, that are not RAID or similarly protected is in my view fool hardy in the extreme. It would be like saying that HDD's can only fail due to surface defects on the platers, and then getting stung when the drive motor fails or the drive electronics stop working or better yet the drive electrics go puff literately in smoke and there is scorch marks on the PCB. Or how about a drive firmware issue that causes them to play dead under certain work loads, or drive firmware issues that just cause them to die prematurely in large numbers. These are all failure modes I have personally witnessed. My sample size for SSD's is still way to small to have seen lots of wacky failure modes, but I don't for one second believe that given time I won't see them. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Greg.Lehmann at csiro.au Mon Jul 18 00:23:09 2016 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Sun, 17 Jul 2016 23:23:09 +0000 Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale Message-ID: Hi All, Given the issues with supporting RHEL 7.2 I am wondering about the latest SLES release and support. Is anybody running actually running it on SLES 12 SP1. I've seen reference to a kernel version that is in SLES 12 SP1, but I'm not sure I trust it as the same document also says RHEL 7.2 is supported for SS 4.2. http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Cheers, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Jul 18 01:39:29 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 18 Jul 2016 00:39:29 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance Message-ID: OK, after a bit of a delay due to a hectic travel week, here is some more information on my GPFS performance collection. At the bottom, I have links to my server and client zimon config files and a link to my presentation at SSUG Argonne in June. I didn't actually present it but included it in case there was interest. I used to do a home brew system of period calls to mmpmon to collect data, sticking them into a kafka database. This was a bit cumbersome and when SS 4.2 arrived, I switched over to the built in performance sensors (zimon) to collect the data. IBM has a "as-is" bridge between Grafana and the Zimon collector that works reasonably well - they were supposed to release it but it's been delayed - I will ask about it again and post more information if I get it. My biggest struggle with the zimon configuration is the large memory requirement of the collector with large clusters (many clients, file systems, NSDs). I ended up deploying a 6 collector federation of 16gb per collector for my larger clusters -0 even then I have to limit the number of stats and amount of time I retain it. IBM is aware of the memory issue and I believe they are looking at ways to reduce it. As for what specific metrics I tend to look at: gpfs_fis_bytes_read (written) - aggregated file system read and write stats gpfs_nsdpool_bytes_read (written) - aggregated pool stats, as I have data and metadata split gpfs_fs_tot_disk_wait_rd (wr) - NSD disk wait stats These seem to make the most sense for me to get an overall sense of how things are going. I have a bunch of other more details dashboards for individual file systems and clients that help me get details. The built-in SS GUI is pretty good for small clusters, and is getting some improvements in 4.2.1 that might make me take a closer look at it again. I also look at the RPC waiters stats - no present in 4.2.0 grafana, but I hear are coming in 4.2.1 My SSUG Argonne Presentation (I didn't talk due to time constraints): http://files.gpfsug.org/presentations/2016/anl-june/SSUG_Nuance_PerfTools.pdf Zimon server config file: https://www.dropbox.com/s/gvtfhhqfpsknfnh/ZIMonSensors.cfg.server?dl=0 Zimon client config file: https://www.dropbox.com/s/k5i6rcnaco4vxu6/ZIMonSensors.cfg.client?dl=0 Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Wednesday, July 13, 2016 at 8:43 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] Aggregating filesystem performance (Oesterlin, Robert) Robert, 1) Do you see any noticeable performance impact by running the performance monitoring? 2) Can you share the zimon configuration that you use? i.e. what metrics do you find most useful? Thank you, Brian Marshall -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Mon Jul 18 15:07:51 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 18 Jul 2016 10:07:51 -0400 Subject: [gpfsug-discuss] SSD LUN setup In-Reply-To: References: Message-ID: @Jonathan, I completely agree on the SSD failure. I wasn't suggesting that better write endurance made them impervious to failures, just that I read a few articles from ~3-5 years back saying that RAID5 or RAID6 would destroy your SSDs and have a really high probability of all SSDs failing at the same time as the # of writes were equal on all SSDs in the RAID group. I think that's no longer the case and RAID6 on SSDs is fine. I was looking for examples of what others have done: RAID6, using GPFS data replicas, or some other thing I don't know about that better takes advantage of SSD architecture. Background - I am a storage noob Also is the @Jonathan proper list etiquette? Thanks everyone to great advice I've been getting. Thank you, Brian On Sun, Jul 17, 2016 at 5:49 PM, Jonathan Buzzard wrote: > On 17/07/16 03:56, Brian Marshall wrote: > >> When setting up SSDs to be used as a fast tier storage pool, are people >> still doing RAID6 LUNs? I think write endurance is good enough now that >> this is no longer a big concern (maybe a small concern). I could be >> wrong. >> >> I have read about other products doing RAID1 with deduplication and >> compression to take less than the 50% capacity hit. >> >> > There are plenty of ways in which an SSD can fail that does not involve > problems with write endurance. The idea of using any disks in anything > other than a test/dev GPFS file system that you simply don't care about if > it goes belly up, that are not RAID or similarly protected is in my view > fool hardy in the extreme. > > It would be like saying that HDD's can only fail due to surface defects on > the platers, and then getting stung when the drive motor fails or the drive > electronics stop working or better yet the drive electrics go puff > literately in smoke and there is scorch marks on the PCB. Or how about a > drive firmware issue that causes them to play dead under certain work > loads, or drive firmware issues that just cause them to die prematurely in > large numbers. > > These are all failure modes I have personally witnessed. My sample size > for SSD's is still way to small to have seen lots of wacky failure modes, > but I don't for one second believe that given time I won't see them. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Mon Jul 18 18:34:38 2016 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Mon, 18 Jul 2016 19:34:38 +0200 Subject: [gpfsug-discuss] SSD LUN setup In-Reply-To: References: Message-ID: Hi Brian, write endurance is one thing you need to run small IOs on on RAID5/RAID6. However, while SSDs are much faster than HDDs when it comes to reads, they are just faster when it comes to writes. The RMW penalty on small writes to RAID5 / RAID6 will incur a higher actual data write rate at your SSD devices than you see going from your OS / file system to the storage. How much higher depends on the actual IO sizes to the RAID device related to your full stripe widths. Mind that the write caches on all levels will help here getting the the IOs larger than what the application does. Beyond a certain point, however, if you go to smaller and smaller IOs (in relation to your stripe widths) you might want to look for some other redundancy code than RAID5/RAID6 or related parity-using mechanisms even if you pay the capacity price of simple data replication (RAID1, or 3w in GNR). That depends of course, but is worth a consideration. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Brian Marshall To: gpfsug main discussion list Date: 07/18/2016 04:08 PM Subject: Re: [gpfsug-discuss] SSD LUN setup Sent by: gpfsug-discuss-bounces at spectrumscale.org @Jonathan, I completely agree on the SSD failure. I wasn't suggesting that better write endurance made them impervious to failures, just that I read a few articles from ~3-5 years back saying that RAID5 or RAID6 would destroy your SSDs and have a really high probability of all SSDs failing at the same time as the # of writes were equal on all SSDs in the RAID group. I think that's no longer the case and RAID6 on SSDs is fine. I was looking for examples of what others have done: RAID6, using GPFS data replicas, or some other thing I don't know about that better takes advantage of SSD architecture. Background - I am a storage noob Also is the @Jonathan proper list etiquette? Thanks everyone to great advice I've been getting. Thank you, Brian On Sun, Jul 17, 2016 at 5:49 PM, Jonathan Buzzard wrote: On 17/07/16 03:56, Brian Marshall wrote: When setting up SSDs to be used as a fast tier storage pool, are people still doing RAID6 LUNs? I think write endurance is good enough now that this is no longer a big concern (maybe a small concern). I could be wrong. I have read about other products doing RAID1 with deduplication and compression to take less than the 50% capacity hit. There are plenty of ways in which an SSD can fail that does not involve problems with write endurance. The idea of using any disks in anything other than a test/dev GPFS file system that you simply don't care about if it goes belly up, that are not RAID or similarly protected is in my view fool hardy in the extreme. It would be like saying that HDD's can only fail due to surface defects on the platers, and then getting stung when the drive motor fails or the drive electronics stop working or better yet the drive electrics go puff literately in smoke and there is scorch marks on the PCB. Or how about a drive firmware issue that causes them to play dead under certain work loads, or drive firmware issues that just cause them to die prematurely in large numbers. These are all failure modes I have personally witnessed. My sample size for SSD's is still way to small to have seen lots of wacky failure modes, but I don't for one second believe that given time I won't see them. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Tue Jul 19 08:59:43 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 19 Jul 2016 07:59:43 +0000 Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale Message-ID: I thought it was supported, but that CES (Integrated protocols support) is only supported up to 7.1 Simon From: > on behalf of "Greg.Lehmann at csiro.au" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 18 July 2016 at 00:23 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale Hi All, Given the issues with supporting RHEL 7.2 I am wondering about the latest SLES release and support. Is anybody running actually running it on SLES 12 SP1. I?ve seen reference to a kernel version that is in SLES 12 SP1, but I?m not sure I trust it as the same document also says RHEL 7.2 is supported for SS 4.2. http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Cheers, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Wed Jul 20 01:17:23 2016 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 20 Jul 2016 00:17:23 +0000 Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale In-Reply-To: References: Message-ID: You are right. An IBMer cleared it up for me. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, 19 July 2016 6:00 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale I thought it was supported, but that CES (Integrated protocols support) is only supported up to 7.1 Simon From: > on behalf of "Greg.Lehmann at csiro.au" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 18 July 2016 at 00:23 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale Hi All, Given the issues with supporting RHEL 7.2 I am wondering about the latest SLES release and support. Is anybody running actually running it on SLES 12 SP1. I've seen reference to a kernel version that is in SLES 12 SP1, but I'm not sure I trust it as the same document also says RHEL 7.2 is supported for SS 4.2. http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Cheers, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Jul 20 13:21:19 2016 From: bevans at pixitmedia.com (Barry Evans) Date: Wed, 20 Jul 2016 13:21:19 +0100 Subject: [gpfsug-discuss] New AFM Toys Message-ID: Just noticed this in the 4.2.0-4 release notes: * Fix the readdir performance issue of independent writer mode filesets in the AFM environment. Introduce a new configuration option afmDIO at the fileset level to replicate data from cache to home using direct IO. Before I start weeping tears of joy, does anyone have any further info on this (the issue and the new parameter?) Does this apply to both NFS and GPFS transpots? It looks very promising! -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 20 15:42:09 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 20 Jul 2016 14:42:09 +0000 Subject: [gpfsug-discuss] Migrating to CES from CTDB Message-ID: Hi all, Does anyone have any experience migrating from CTDB and GPFS 3.5 to CES and GPFS 4.2? We've got a plan of how to do it, but the issue is doing it without causing any downtime to the front end. We're using "secrets and keytab" for auth in smb.conf. So the only way I think we can do it is build out the 4.2 servers and somehow integrate them into the existing cluster (front end cluster) - or more accurately - keep the same FQDN of the cluster and just change DNS to point the FDQN to the new servers, and remove it from the existing ones. The big question is: will this work in theory? The downtime option involves shutting down the CTDB cluster, deleting the AD object, renaming the new cluster and starting CES SMB to allow it to join AD with the same name. This takes about 15 minutes while allowing for AD replication and the TTL on the DNS. This then has to be repeated when the original CTDB nodes have been reinstalled. I feel like I'm rambling but I just can't find any guides on migrating protocols from CTDB to CES, just migrations of GPFS itself. Plus my knowledge of samba isn't all that :) Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Jul 20 16:15:39 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 20 Jul 2016 16:15:39 +0100 Subject: [gpfsug-discuss] Migrating to CES from CTDB In-Reply-To: References: Message-ID: <1469027739.26989.8.camel@buzzard.phy.strath.ac.uk> On Wed, 2016-07-20 at 14:42 +0000, Sobey, Richard A wrote: [SNIP] > > The downtime option involves shutting down the CTDB cluster, deleting > the AD object, renaming the new cluster and starting CES SMB to allow > it to join AD with the same name. This takes about 15 minutes while > allowing for AD replication and the TTL on the DNS. This then has to > be repeated when the original CTDB nodes have been reinstalled. > Can you not reduce the TTL on the DNS to as low as possible prior to the changeover to reduce the required downtime for the switch over? You are also aware that you can force the AD replication so no need to wait for that, other than the replication time, which should be pretty quick? I also believe that it is not necessary to delete the AD object. Just leave it as is, and it will be overwritten when you join the new CES cluster. Saves you a step. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From r.sobey at imperial.ac.uk Wed Jul 20 16:23:00 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 20 Jul 2016 15:23:00 +0000 Subject: [gpfsug-discuss] Migrating to CES from CTDB In-Reply-To: <1469027739.26989.8.camel@buzzard.phy.strath.ac.uk> References: <1469027739.26989.8.camel@buzzard.phy.strath.ac.uk> Message-ID: I was thinking of that. Current TTL is 900s, we can probably lower it on a temporary basis to facilitate the change. I wasn't aware about the AD object, no... I presume the existing object will simply be updated when the new cluster joins, which in turn will trigger a replication of it anyway? Thanks Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathan Buzzard Sent: 20 July 2016 16:16 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Migrating to CES from CTDB On Wed, 2016-07-20 at 14:42 +0000, Sobey, Richard A wrote: [SNIP] > > The downtime option involves shutting down the CTDB cluster, deleting > the AD object, renaming the new cluster and starting CES SMB to allow > it to join AD with the same name. This takes about 15 minutes while > allowing for AD replication and the TTL on the DNS. This then has to > be repeated when the original CTDB nodes have been reinstalled. > Can you not reduce the TTL on the DNS to as low as possible prior to the changeover to reduce the required downtime for the switch over? You are also aware that you can force the AD replication so no need to wait for that, other than the replication time, which should be pretty quick? I also believe that it is not necessary to delete the AD object. Just leave it as is, and it will be overwritten when you join the new CES cluster. Saves you a step. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Mark.Bush at siriuscom.com Wed Jul 20 18:23:32 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 17:23:32 +0000 Subject: [gpfsug-discuss] More fun with Policies Message-ID: Can anyone explain why I can?t apply a policy in the GUI any longer? Here?s my screenshot of what I?m presented with now after I created a policy from the CLI [cid:image001.png at 01D1E281.87334DC0] See how the editor is missing as well as the typical buttons on the bottom of the Leftmost column? Also how does one delete a policy? There?s no mmdelpolicy Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 30212 bytes Desc: image001.png URL: From jamiedavis at us.ibm.com Wed Jul 20 19:17:09 2016 From: jamiedavis at us.ibm.com (James Davis) Date: Wed, 20 Jul 2016 18:17:09 +0000 Subject: [gpfsug-discuss] More fun with Policies In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.image001.png at 01D1E281.87334DC0.png Type: image/png Size: 30212 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Wed Jul 20 19:24:02 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 18:24:02 +0000 Subject: [gpfsug-discuss] More fun with Policies In-Reply-To: References: Message-ID: <09C9ADED-936E-4779-83D7-4D9A008E4302@siriuscom.com> Thanks James. I did just that (running 4.2.0.3). mmchpolicy fs1 DEFAULT. It didn?t fix the gui however I wonder if it?s a bug in the gui code or something like that. From: on behalf of James Davis Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 1:17 PM To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] More fun with Policies Hi Mark, I don't have an answer about the GUI change, but I believe as of 4.1 you can "delete" a policy by using mmchpolicy like this: #14:15:36# c42an3:~ # mmchpolicy c42_fs2_dmapi DEFAULT GPFS: 6027-2809 Validated policy 'DEFAULT': GPFS: 6027-799 Policy `DEFAULT' installed and broadcast to all nodes. #14:16:06# c42an3:~ # mmlspolicy c42_fs2_dmapi -L /* DEFAULT */ /* Store data in the first data pool or system pool */ If your release doesn't support that, try a simple policy like RULE 'default' SET POOL 'system' or RULE 'default' SET POOL '' Cheers, Jamie Jamie Davis GPFS Functional Verification Test (FVT) jamiedavis at us.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] More fun with Policies Date: Wed, Jul 20, 2016 1:24 PM Can anyone explain why I can?t apply a policy in the GUI any longer? Here?s my screenshot of what I?m presented with now after I created a policy from the CLI [cid:image001.png at 01D1E289.FAA9E450] See how the editor is missing as well as the typical buttons on the bottom of the Leftmost column? Also how does one delete a policy? There?s no mmdelpolicy Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 30213 bytes Desc: image001.png URL: From Mark.Bush at siriuscom.com Wed Jul 20 19:45:41 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 18:45:41 +0000 Subject: [gpfsug-discuss] More fun with Policies In-Reply-To: <09C9ADED-936E-4779-83D7-4D9A008E4302@siriuscom.com> References: <09C9ADED-936E-4779-83D7-4D9A008E4302@siriuscom.com> Message-ID: I killed my browser cache and all is well now. From: on behalf of Mark Bush Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 1:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] More fun with Policies Thanks James. I did just that (running 4.2.0.3). mmchpolicy fs1 DEFAULT. It didn?t fix the gui however I wonder if it?s a bug in the gui code or something like that. From: on behalf of James Davis Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 1:17 PM To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] More fun with Policies Hi Mark, I don't have an answer about the GUI change, but I believe as of 4.1 you can "delete" a policy by using mmchpolicy like this: #14:15:36# c42an3:~ # mmchpolicy c42_fs2_dmapi DEFAULT GPFS: 6027-2809 Validated policy 'DEFAULT': GPFS: 6027-799 Policy `DEFAULT' installed and broadcast to all nodes. #14:16:06# c42an3:~ # mmlspolicy c42_fs2_dmapi -L /* DEFAULT */ /* Store data in the first data pool or system pool */ If your release doesn't support that, try a simple policy like RULE 'default' SET POOL 'system' or RULE 'default' SET POOL '' Cheers, Jamie Jamie Davis GPFS Functional Verification Test (FVT) jamiedavis at us.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] More fun with Policies Date: Wed, Jul 20, 2016 1:24 PM Can anyone explain why I can?t apply a policy in the GUI any longer? Here?s my screenshot of what I?m presented with now after I created a policy from the CLI [cid:image001.png at 01D1E28D.0011FCE0] See how the editor is missing as well as the typical buttons on the bottom of the Leftmost column? Also how does one delete a policy? There?s no mmdelpolicy Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 30214 bytes Desc: image001.png URL: From Mark.Bush at siriuscom.com Wed Jul 20 21:47:13 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 20:47:13 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario Message-ID: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenh at us.ibm.com Wed Jul 20 22:15:49 2016 From: kenh at us.ibm.com (Ken Hill) Date: Wed, 20 Jul 2016 17:15:49 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone: 1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Wed Jul 20 22:27:03 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 20 Jul 2016 21:27:03 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Hi Mark, We do this. We have sync replication between two sites with extended san and Ethernet fabric between them. We then use copies=2 for both metadata and data (most filesets). We then also have a vm quorum node which runs on VMware in a fault tolerant cluster. We tested split braining the sites before we went into production. It does work, but we did find some interesting failure modes doing the testing, so do that and push it hard. We multicluster our ces nodes (yes technically I know isn't supported), and again have a quorum vm which has dc affinity to the storage cluster one to ensure ces fails to the same DC. You may also want to look at readReplicaPolicy=local and Infiniband fabric numbers, and probably subnets to ensure your clients prefer the local site for read. Write of course needs enough bandwidth between sites to keep it fast. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mark.Bush at siriuscom.com [Mark.Bush at siriuscom.com] Sent: 20 July 2016 21:47 To: gpfsug main discussion list Subject: [gpfsug-discuss] NDS in Two Site scenario For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions From makaplan at us.ibm.com Wed Jul 20 22:52:25 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 20 Jul 2016 17:52:25 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Careful! You need to plan and test, test and plan both failure scenarios and performance under high network loads. I don't believe GPFS was designed with the idea of splitting clusters over multiple sites. If your inter-site network runs fast enough, and you can administer it well enough -- perhaps it will work well enough... Hint: Think about the what the words "cluster" and "site" mean. GPFS does have the AFM feature, which was designed for multi-site deployments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Jul 20 23:34:53 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 20 Jul 2016 23:34:53 +0100 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: <3e1dc902-ca52-4ab1-2ca3-e51ba3f18b32@buzzard.me.uk> On 20/07/16 22:15, Ken Hill wrote: [SNIP] > You can further isolate failure by increasing quorum (odd numbers). > > The way quorum works is: The majority of the quorum nodes need to be up > to survive an outage. > > - With 3 quorum nodes you can have 1 quorum node failures and continue > filesystem operations. > - With 5 quorum nodes you can have 2 quorum node failures and continue > filesystem operations. > - With 7 quorum nodes you can have 3 quorum node failures and continue > filesystem operations. > - etc > The alternative is a tiebreaker disk to prevent split brain clusters. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Mark.Bush at siriuscom.com Thu Jul 21 00:33:06 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 23:33:06 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kcfor more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image001.png at 01D1E2B5.280C2EA0] [cid:image002.png at 01D1E2B5.280C2EA0] [cid:image003.png at 01D1E2B5.280C2EA0] [cid:image004.png at 01D1E2B5.280C2EA0] [cid:image005.png at 01D1E2B5.280C2EA0] [cid:image006.png at 01D1E2B5.280C2EA0] [cid:image007.png at 01D1E2B5.280C2EA0] [cid:image008.png at 01D1E2B5.280C2EA0] [cid:image009.png at 01D1E2B5.280C2EA0] [cid:image010.png at 01D1E2B5.280C2EA0] [cid:image011.png at 01D1E2B5.280C2EA0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1621 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1597 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1072 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 979 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1564 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1313 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 1168 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 1426 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1369 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 1244 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 4454 bytes Desc: image011.png URL: From Mark.Bush at siriuscom.com Thu Jul 21 00:34:29 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 23:34:29 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Marc, what you are saying is anything outside a particular data center shouldn?t be part of a cluster? I?m not sure marketing is in line with this then. From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:52 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Careful! You need to plan and test, test and plan both failure scenarios and performance under high network loads. I don't believe GPFS was designed with the idea of splitting clusters over multiple sites. If your inter-site network runs fast enough, and you can administer it well enough -- perhaps it will work well enough... Hint: Think about the what the words "cluster" and "site" mean. GPFS does have the AFM feature, which was designed for multi-site deployments. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenh at us.ibm.com Thu Jul 21 01:02:01 2016 From: kenh at us.ibm.com (Ken Hill) Date: Wed, 20 Jul 2016 20:02:01 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone: 1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1621 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1597 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1072 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 979 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1564 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1426 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1369 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1244 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4454 bytes Desc: not available URL: From YARD at il.ibm.com Thu Jul 21 05:48:09 2016 From: YARD at il.ibm.com (Yaron Daniel) Date: Thu, 21 Jul 2016 07:48:09 +0300 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: HI U must remember the following: Network vlan should be the same between 2 Main Sites - since the CES IP failover will not work... U can define : Site1 - 2 x NSD servers + Quorum Site2 - 2 x NSD servers + Quorum GPFS FS replication define with failure groups. (Latency must be very low in order to have write performance). Site3 - 1 x Quorum + Local disk as Tie Breaker Disk. (Desc Only) Hope this help. Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Ken Hill" To: gpfsug main discussion list Date: 07/21/2016 03:02 AM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1621 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1597 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1072 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 979 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1564 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1426 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1369 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1244 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4454 bytes Desc: not available URL: From ashish.thandavan at cs.ox.ac.uk Thu Jul 21 11:26:02 2016 From: ashish.thandavan at cs.ox.ac.uk (Ashish Thandavan) Date: Thu, 21 Jul 2016 11:26:02 +0100 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience Message-ID: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> Dear all, Please could anyone be able to point me at specifications required for the GPFS heartbeat network? Are there any figures for latency, jitter, etc that one should be aware of? I also have a related question about resilience. Our three GPFS NSD servers utilize a single network port on each server and communicate heartbeat traffic over a private VLAN. We are looking at improving the resilience of this setup by adding an additional network link on each server (going to a different member of a pair of stacked switches than the existing one) and running the heartbeat network over bonded interfaces on the three servers. Are there any recommendations as to which network bonding type to use? Based on the name alone, Mode 1 (active-backup) appears to be the ideal choice, and I believe the switches do not need any special configuration. However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might be the way to go; this aggregates the two ports and does require the relevant switch ports to be configured to support this. Is there a recommended bonding mode? If anyone here currently uses bonded interfaces for their GPFS heartbeat traffic, may I ask what type of bond have you configured? Have you had any problems with the setup? And more importantly, has it been of use in keeping the cluster up and running in the scenario of one network link going down? Thank you, Regards, Ash -- ------------------------- Ashish Thandavan UNIX Support Computing Officer Department of Computer Science University of Oxford Wolfson Building Parks Road Oxford OX1 3QD Phone: 01865 610733 Email: ashish.thandavan at cs.ox.ac.uk From Mark.Bush at siriuscom.com Thu Jul 21 13:45:12 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 21 Jul 2016 12:45:12 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. Site1 NSD Node1 ---NSD1 ---Physical LUN1 from SAN1 NSD Node2 Site2 NSD Node3 ---NSD2 ?Physical LUN2 from SAN2 NSD Node4 Or Site1 NSD Node1 ----NSD1 ?Physical LUN1 from SAN1 ----NSD2 ?Physical LUN2 from SAN2 NSD Node2 Site 2 NSD Node3 ---NSD2 ? Physical LUN2 from SAN2 ---NSD1 --Physical LUN1 from SAN1 NSD Node4 Site 3 Node5 Quorum From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 7:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image001.png at 01D1E323.CF61AAE0] [cid:image002.png at 01D1E323.CF61AAE0] [cid:image003.png at 01D1E323.CF61AAE0] [cid:image004.png at 01D1E323.CF61AAE0] [cid:image005.png at 01D1E323.CF61AAE0] [cid:image006.png at 01D1E323.CF61AAE0] [cid:image007.png at 01D1E323.CF61AAE0] [cid:image008.png at 01D1E323.CF61AAE0] [cid:image009.png at 01D1E323.CF61AAE0] [cid:image010.png at 01D1E323.CF61AAE0] [cid:image011.png at 01D1E323.CF61AAE0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kcfor more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image012.png at 01D1E323.CF61AAE0] [cid:image013.png at 01D1E323.CF61AAE0] [cid:image014.png at 01D1E323.CF61AAE0] [cid:image015.png at 01D1E323.CF61AAE0] [cid:image016.png at 01D1E323.CF61AAE0] [cid:image017.png at 01D1E323.CF61AAE0] [cid:image018.png at 01D1E323.CF61AAE0] [cid:image019.png at 01D1E323.CF61AAE0] [cid:image020.png at 01D1E323.CF61AAE0] [cid:image021.png at 01D1E323.CF61AAE0] [cid:image022.png at 01D1E323.CF61AAE0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1621 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1597 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1072 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 979 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1564 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1313 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 1168 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 1426 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1369 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 1244 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 4454 bytes Desc: image011.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image012.png Type: image/png Size: 1622 bytes Desc: image012.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image013.png Type: image/png Size: 1598 bytes Desc: image013.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image014.png Type: image/png Size: 1073 bytes Desc: image014.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image015.png Type: image/png Size: 980 bytes Desc: image015.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image016.png Type: image/png Size: 1565 bytes Desc: image016.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image017.png Type: image/png Size: 1314 bytes Desc: image017.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image018.png Type: image/png Size: 1169 bytes Desc: image018.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image019.png Type: image/png Size: 1427 bytes Desc: image019.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image020.png Type: image/png Size: 1370 bytes Desc: image020.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image021.png Type: image/png Size: 1245 bytes Desc: image021.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image022.png Type: image/png Size: 4455 bytes Desc: image022.png URL: From jonathan at buzzard.me.uk Thu Jul 21 14:01:06 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 21 Jul 2016 14:01:06 +0100 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: <1469106066.26989.33.camel@buzzard.phy.strath.ac.uk> On Thu, 2016-07-21 at 12:45 +0000, Mark.Bush at siriuscom.com wrote: > This is where my confusion sits. So if I have two sites, and two NDS > Nodes per site with 1 NSD (to keep it simple), do I just present the > physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to > Site2 NSD Nodes? Unless you are going to use a tiebreaker disk you need an odd number of NSD nodes. If you don't you risk a split brain cluster and well god only knows what will happen to your file system in such a scenario. > Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and > the same at Site2? (Assuming SAN and not direct attached in this > case). I know I?m being persistent but this for some reason confuses > me. That's one way of doing it assuming that you have extended your SAN across both sites. You present all LUN's to all NSD nodes regardless of which site they are at. With this method you can use a tiebreaker disk. Alternatively you present the LUN's at site one to the NSD servers at site one and all the LUN's at site two to the NSD servers at site two, and set failure and replication groups up appropriately. However in this scenario it is critical to have an odd number of NSD servers because you can only use tiebreaker disks where every NSD node can see the physical disk aka it's SAN attached (either FC or iSCSI) to all NSD nodes. That said as others have pointed out, beyond a metropolitan area network I can't see multi site GPFS working. You could I guess punt iSCSI over the internet but performance is going to be awful, and iSCSI and GPFS just don't mix in my experience. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From S.J.Thompson at bham.ac.uk Thu Jul 21 14:02:03 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 21 Jul 2016 13:02:03 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: It depends. What are you protecting against? Either will work depending on your acceptable failure modes. I'm assuming here that you are using copies=2 to replicate the data, and that the NSD devices have different failure groups per site. In the second example, if you were to lose the NSD servers in Site 1, but not the SAN, you would continue to have 2 copies of data written as the NSD servers in Site 2 could write to the SAN in Site 1. In the first example you would need to rest ripe the file-system when brining the Site 1 back online to ensure data is replicated.\ Simon From: > on behalf of "Mark.Bush at siriuscom.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 21 July 2016 at 13:45 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. Site1 NSD Node1 ---NSD1 ---Physical LUN1 from SAN1 NSD Node2 Site2 NSD Node3 ---NSD2 ?Physical LUN2 from SAN2 NSD Node4 Or Site1 NSD Node1 ----NSD1 ?Physical LUN1 from SAN1 ----NSD2 ?Physical LUN2 from SAN2 NSD Node2 Site 2 NSD Node3 ---NSD2 ? Physical LUN2 from SAN2 ---NSD1 --Physical LUN1 from SAN1 NSD Node4 Site 3 Node5 Quorum From: > on behalf of Ken Hill > Reply-To: gpfsug main discussion list > Date: Wednesday, July 20, 2016 at 7:02 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image001.png at 01D1E323.CF61AAE0] [cid:image002.png at 01D1E323.CF61AAE0] [cid:image003.png at 01D1E323.CF61AAE0] [cid:image004.png at 01D1E323.CF61AAE0] [cid:image005.png at 01D1E323.CF61AAE0] [cid:image006.png at 01D1E323.CF61AAE0] [cid:image007.png at 01D1E323.CF61AAE0] [cid:image008.png at 01D1E323.CF61AAE0] [cid:image009.png at 01D1E323.CF61AAE0] [cid:image010.png at 01D1E323.CF61AAE0] [cid:image011.png at 01D1E323.CF61AAE0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" > To: gpfsug main discussion list > Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So in this scenario Ken, can server3 see any disks in site1? From: > on behalf of Ken Hill > Reply-To: gpfsug main discussion list > Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kcfor more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image012.png at 01D1E323.CF61AAE0] [cid:image013.png at 01D1E323.CF61AAE0] [cid:image014.png at 01D1E323.CF61AAE0] [cid:image015.png at 01D1E323.CF61AAE0] [cid:image016.png at 01D1E323.CF61AAE0] [cid:image017.png at 01D1E323.CF61AAE0] [cid:image018.png at 01D1E323.CF61AAE0] [cid:image019.png at 01D1E323.CF61AAE0] [cid:image020.png at 01D1E323.CF61AAE0] [cid:image021.png at 01D1E323.CF61AAE0] [cid:image022.png at 01D1E323.CF61AAE0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" > To: gpfsug main discussion list > Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1621 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1597 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1072 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 979 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1564 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1313 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 1168 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 1426 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1369 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 1244 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 4454 bytes Desc: image011.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image012.png Type: image/png Size: 1622 bytes Desc: image012.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image013.png Type: image/png Size: 1598 bytes Desc: image013.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image014.png Type: image/png Size: 1073 bytes Desc: image014.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image015.png Type: image/png Size: 980 bytes Desc: image015.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image016.png Type: image/png Size: 1565 bytes Desc: image016.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image017.png Type: image/png Size: 1314 bytes Desc: image017.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image018.png Type: image/png Size: 1169 bytes Desc: image018.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image019.png Type: image/png Size: 1427 bytes Desc: image019.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image020.png Type: image/png Size: 1370 bytes Desc: image020.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image021.png Type: image/png Size: 1245 bytes Desc: image021.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image022.png Type: image/png Size: 4455 bytes Desc: image022.png URL: From viccornell at gmail.com Thu Jul 21 14:02:02 2016 From: viccornell at gmail.com (Vic Cornell) Date: Thu, 21 Jul 2016 14:02:02 +0100 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: The annoying answer is "it depends?. I ran a system with all of the NSDs being visible to all of the NSDs on both sites and that worked well. However there are lots of questions to answer: Where are the clients going to live? Will you have clients in both sites or just one? Is it dual site working or just DR? Where will the majority of the writes happen? Would you rather that traffic went over the SAN or the IP link? Do you have a SAN link between the 2 sites? Which is faster, the SAN link between sites or the IP link between the sites? Are they the same link? Are they both redundant, which is the most stable? The answers to these questions would drive the design of the gpfs filesystem. For example if there are clients on only on site A , you might then make the NSD servers on site A the primary NSD servers for all of the NSDs on site A and site B - then you send the replica blocks over the SAN. You also could make a matrix of the failure scenarios you want to protect against, the consequences of the failure and the likelihood of failure etc. That will also inform the design. Does that help? Vic > On 21 Jul 2016, at 1:45 pm, Mark.Bush at siriuscom.com wrote: > > This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. > > Site1 > NSD Node1 > ---NSD1 ---Physical LUN1 from SAN1 > NSD Node2 > > > Site2 > NSD Node3 > ---NSD2 ?Physical LUN2 from SAN2 > NSD Node4 > > > Or > > > Site1 > NSD Node1 > ----NSD1 ?Physical LUN1 from SAN1 > ----NSD2 ?Physical LUN2 from SAN2 > NSD Node2 > > Site 2 > NSD Node3 > ---NSD2 ? Physical LUN2 from SAN2 > ---NSD1 --Physical LUN1 from SAN1 > NSD Node4 > > > Site 3 > Node5 Quorum > > > > From: > on behalf of Ken Hill > > Reply-To: gpfsug main discussion list > > Date: Wednesday, July 20, 2016 at 7:02 PM > To: gpfsug main discussion list > > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario > > Yes - it is a cluster. > > The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). > > Regards, > > Ken Hill > Technical Sales Specialist | Software Defined Solution Sales > IBM Systems > Phone:1-540-207-7270 > E-mail: kenh at us.ibm.com > > > 2300 Dulles Station Blvd > Herndon, VA 20171-6133 > United States > > > > > > > From: "Mark.Bush at siriuscom.com " > > To: gpfsug main discussion list > > Date: 07/20/2016 07:33 PM > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > So in this scenario Ken, can server3 see any disks in site1? > > From: > on behalf of Ken Hill > > Reply-To: gpfsug main discussion list > > Date: Wednesday, July 20, 2016 at 4:15 PM > To: gpfsug main discussion list > > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario > > > Site1 Site2 > Server1 (quorum 1) Server3 (quorum 2) > Server2 Server4 > > > > > SiteX > Server5 (quorum 3) > > > > > You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. > > You can further isolate failure by increasing quorum (odd numbers). > > The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. > > - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. > - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. > - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. > - etc > > Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. > > Ken Hill > Technical Sales Specialist | Software Defined Solution Sales > IBM Systems > Phone:1-540-207-7270 > E-mail: kenh at us.ibm.com > > > 2300 Dulles Station Blvd > Herndon, VA 20171-6133 > United States > > > > > > > From: "Mark.Bush at siriuscom.com " > > To: gpfsug main discussion list > > Date: 07/20/2016 04:47 PM > Subject: [gpfsug-discuss] NDS in Two Site scenario > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. > > > > Mark R. Bush| Solutions Architect > Mobile: 210.237.8415 | mark.bush at siriuscom.com > Sirius Computer Solutions | www.siriuscom.com > 10100 Reunion Place, Suite 500, San Antonio, TX 78216 > > This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. > > Sirius Computer Solutions _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 21 14:12:58 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 21 Jul 2016 13:12:58 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: <10D22907-E641-41AF-A31A-17755288E005@siriuscom.com> Thanks Vic&Simon, I?m totally cool with ?it depends? the solution guidance is to achieve a Highly Available FS. And there is Dark Fibre between the two locations. FileNet is the application and they want two things. Ability to write in both locations (maybe close to at the same time not necessarily the same files though) and protect against any site failure. So in my mind my Scenario 1 would work as long as I had copies=2 and restripe are acceptable. Is my Scenario 2 I would still have to restripe if the SAN in site 1 went down. I?m looking for the simplest approach that provides the greatest availability. From: on behalf of "Simon Thompson (Research Computing - IT Services)" Reply-To: gpfsug main discussion list Date: Thursday, July 21, 2016 at 8:02 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario It depends. What are you protecting against? Either will work depending on your acceptable failure modes. I'm assuming here that you are using copies=2 to replicate the data, and that the NSD devices have different failure groups per site. In the second example, if you were to lose the NSD servers in Site 1, but not the SAN, you would continue to have 2 copies of data written as the NSD servers in Site 2 could write to the SAN in Site 1. In the first example you would need to rest ripe the file-system when brining the Site 1 back online to ensure data is replicated.\ Simon From: > on behalf of "Mark.Bush at siriuscom.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 21 July 2016 at 13:45 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. Site1 NSD Node1 ---NSD1 ---Physical LUN1 from SAN1 NSD Node2 Site2 NSD Node3 ---NSD2 ?Physical LUN2 from SAN2 NSD Node4 Or Site1 NSD Node1 ----NSD1 ?Physical LUN1 from SAN1 ----NSD2 ?Physical LUN2 from SAN2 NSD Node2 Site 2 NSD Node3 ---NSD2 ? Physical LUN2 from SAN2 ---NSD1 --Physical LUN1 from SAN1 NSD Node4 Site 3 Node5 Quorum From: > on behalf of Ken Hill > Reply-To: gpfsug main discussion list > Date: Wednesday, July 20, 2016 at 7:02 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image001.png at 01D1E327.B037C650] [cid:image002.png at 01D1E327.B037C650] [cid:image003.png at 01D1E327.B037C650] [cid:image004.png at 01D1E327.B037C650] [cid:image005.png at 01D1E327.B037C650] [cid:image006.png at 01D1E327.B037C650] [cid:image007.png at 01D1E327.B037C650] [cid:image008.png at 01D1E327.B037C650] [cid:image009.png at 01D1E327.B037C650] [cid:image010.png at 01D1E327.B037C650] [cid:image011.png at 01D1E327.B037C650] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" > To: gpfsug main discussion list > Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So in this scenario Ken, can server3 see any disks in site1? From: > on behalf of Ken Hill > Reply-To: gpfsug main discussion list > Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kcfor more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image012.png at 01D1E327.B037C650] [cid:image013.png at 01D1E327.B037C650] [cid:image014.png at 01D1E327.B037C650] [cid:image015.png at 01D1E327.B037C650] [cid:image016.png at 01D1E327.B037C650] [cid:image017.png at 01D1E327.B037C650] [cid:image018.png at 01D1E327.B037C650] [cid:image019.png at 01D1E327.B037C650] [cid:image020.png at 01D1E327.B037C650] [cid:image021.png at 01D1E327.B037C650] [cid:image022.png at 01D1E327.B037C650] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" > To: gpfsug main discussion list > Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1622 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1598 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1073 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 980 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1565 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1314 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 1169 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 1427 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1370 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 1245 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 4455 bytes Desc: image011.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image012.png Type: image/png Size: 1623 bytes Desc: image012.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image013.png Type: image/png Size: 1599 bytes Desc: image013.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image014.png Type: image/png Size: 1074 bytes Desc: image014.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image015.png Type: image/png Size: 981 bytes Desc: image015.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image016.png Type: image/png Size: 1566 bytes Desc: image016.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image017.png Type: image/png Size: 1315 bytes Desc: image017.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image018.png Type: image/png Size: 1170 bytes Desc: image018.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image019.png Type: image/png Size: 1428 bytes Desc: image019.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image020.png Type: image/png Size: 1371 bytes Desc: image020.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image021.png Type: image/png Size: 1246 bytes Desc: image021.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image022.png Type: image/png Size: 4456 bytes Desc: image022.png URL: From makaplan at us.ibm.com Thu Jul 21 14:33:47 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 21 Jul 2016 09:33:47 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: I don't know. That said, let's be logical and cautious. Your network performance has got to be comparable to (preferably better than!) your disk/storage system. Think speed, latency, bandwidth, jitter, reliability, security. For a production system with data you care about, that probably means a dedicated/private/reserved channel, probably on private or leased fiber. Sure you can cobble together a demo, proof-of-concept, or prototype with less than that, but are you going to bet your career, life, friendships, data on that? Then you have to work through and test failure and recover scenarios... This forum would be one place to gather at least some anecdotes from power users/admins who might be running GPFS clusters spread over multiple kilometers... Is there a sale or marketing team selling this? What do they recommend? Here is an excerpt from an IBM white paper I found by googling... Notice the qualifier "high quality wide area network": "...Synchronous replication works well for many workloads by replicating data across storage arrays within a data center, within a campus or across geographical distances using high quality wide area network connections. When wide area network connections are not high performance or are not reliable, an asynchronous approach to data replication is required. GPFS 3.5 introduces a feature called Active File Management (AFM). ..." Of course GPFS has improved (and been renamed!) since 3.5 but 4.2 cannot magically compensate for a not-so-high-quality network! From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:34 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org Marc, what you are saying is anything outside a particular data center shouldn?t be part of a cluster? I?m not sure marketing is in line with this then. From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:52 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Careful! You need to plan and test, test and plan both failure scenarios and performance under high network loads. I don't believe GPFS was designed with the idea of splitting clusters over multiple sites. If your inter-site network runs fast enough, and you can administer it well enough -- perhaps it will work well enough... Hint: Think about the what the words "cluster" and "site" mean. GPFS does have the AFM feature, which was designed for multi-site deployments. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 21 15:01:01 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 21 Jul 2016 14:01:01 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: So just to be clear, my DCs are about 1.5kM as the fibre goes. We have dedicated extended SAN fibre and also private multi-10GbE links between the sites with Ethernet fabric switches. Simon From: > on behalf of Marc A Kaplan > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 21 July 2016 at 14:33 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario This forum would be one place to gather at least some anecdotes from power users/admins who might be running GPFS clusters spread over multiple kilometers... -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 21 15:01:49 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 21 Jul 2016 14:01:49 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Well said Marc. I think in IBM?s marketing pitches they make it sound so simple and easy. But this doesn?t take the place of well planned, tested, and properly sized implementations. From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Thursday, July 21, 2016 at 8:33 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario I don't know. That said, let's be logical and cautious. Your network performance has got to be comparable to (preferably better than!) your disk/storage system. Think speed, latency, bandwidth, jitter, reliability, security. For a production system with data you care about, that probably means a dedicated/private/reserved channel, probably on private or leased fiber. Sure you can cobble together a demo, proof-of-concept, or prototype with less than that, but are you going to bet your career, life, friendships, data on that? Then you have to work through and test failure and recover scenarios... This forum would be one place to gather at least some anecdotes from power users/admins who might be running GPFS clusters spread over multiple kilometers... Is there a sale or marketing team selling this? What do they recommend? Here is an excerpt from an IBM white paper I found by googling... Notice the qualifier "high quality wide area network": "...Synchronous replication works well for many workloads by replicating data across storage arrays within a data center, within a campus or across geographical distances using high quality wide area network connections. When wide area network connections are not high performance or are not reliable, an asynchronous approach to data replication is required. GPFS 3.5 introduces a feature called Active File Management (AFM). ..." Of course GPFS has improved (and been renamed!) since 3.5 but 4.2 cannot magically compensate for a not-so-high-quality network! From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:34 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Marc, what you are saying is anything outside a particular data center shouldn?t be part of a cluster? I?m not sure marketing is in line with this then. From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:52 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Careful! You need to plan and test, test and plan both failure scenarios and performance under high network loads. I don't believe GPFS was designed with the idea of splitting clusters over multiple sites. If your inter-site network runs fast enough, and you can administer it well enough -- perhaps it will work well enough... Hint: Think about the what the words "cluster" and "site" mean. GPFS does have the AFM feature, which was designed for multi-site deployments. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From eboyd at us.ibm.com Thu Jul 21 15:39:18 2016 From: eboyd at us.ibm.com (Edward Boyd) Date: Thu, 21 Jul 2016 14:39:18 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario @ Mark Bush In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From sjhoward at iu.edu Thu Jul 21 16:21:04 2016 From: sjhoward at iu.edu (Howard, Stewart Jameson) Date: Thu, 21 Jul 2016 15:21:04 +0000 Subject: [gpfsug-discuss] Performance Issues with SMB/NFS to GPFS Backend Message-ID: <8307febb40d24a58b036002224f27d4d@in-cci-exch09.ads.iu.edu> Hi All, I have a two-site replicate GPFS cluster running GPFS v3.5.0-26. We have recently run into a performane problem while exporting an SMB mount to one of our client labs. Specifically, this lab is attempting to run a MatLab SPM job in the SMB share and seeing sharply degraded performance versus running it over NFS to their own NFS service. The job does time-slice correction on MRI image volumes that result in roughly 15,000 file creates, plus at lease one read and at least one write to each file. Here is a list that briefly describes the time-to-completion for this job, as run under various conditions: 1) Backed by their local fileserver, running over NFS - 5 min 2) Backed by our GPFS, running over SMB - 30 min 3) Backed by our GPFS, running over NFS - 20 min 4) Backed by local disk on our exporting protocol node, over SMB - 6 min 5) Backed by local disk on our exporting protocol node, over NFS - 6 min 6) Back by GPFS, running over GPFS native client on our supercomputer - 2 min >From this list, it seems that the performance problems arise when combining either SMB or NFS with the GPFS backend. It is our conclusion that neither SMB nor NFS per se create the problem, exporting a local disk share over either of these protocols yields decent performance. Do you have any insight as to why the combination of the GPFS back-end with either NFS or SMB yields such anemic performance? Can you offer any tuning recommendations that may improve the performance when running over SMB to the GPFS back-end (our preferred method of deployment)? Thank you so much for your help as always! Stewart Howard Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 21 16:44:17 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 21 Jul 2016 11:44:17 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <8307febb40d24a58b036002224f27d4d@in-cci-exch09.ads.iu.edu> References: <8307febb40d24a58b036002224f27d4d@in-cci-exch09.ads.iu.edu> Message-ID: [Apologies] It has been pointed out to me that anyone seriously interested in clusters split over multiple sites should ReadTheFineManuals and in particular chapter 6 of the GPFS or Spectrum Scale Advanced Admin Guide. I apologize for anything I said that may have contradicted TFMs. Still it seems any which way you look at it - State of the art, today, this is not an easy plug and play, tab A into slot A, tab B into slot B and we're done - kinda-thing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Jul 21 17:04:48 2016 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Thu, 21 Jul 2016 16:04:48 +0000 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support Message-ID: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in particular the putacl and getacl functions) have no support for not following symlinks. Is there some hidden support for gpfs_putacl that will cause it to not deteference symbolic links? Something like the O_NOFOLLOW flag used elsewhere in linux? Thanks! -Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Thu Jul 21 18:15:18 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Thu, 21 Jul 2016 17:15:18 +0000 Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Message-ID: Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just keep on coming.... What's the story with Peer Snapshot functionality over GPFS protocol? Will it ever work, or should I crank my nice setup down a notch to NFS? Are peer snapshots supposed to work over NFS for Independent Writer mode AFM caches, or just Single Writers? Like Barry says, new AFM toys are great. Can we have more, please? Cheers, Luke, Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From shankbal at in.ibm.com Fri Jul 22 01:51:53 2016 From: shankbal at in.ibm.com (Shankar Balasubramanian) Date: Fri, 22 Jul 2016 06:21:53 +0530 Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol In-Reply-To: References: Message-ID: Peer snapshots will not work for independent writers as they are purely meant for Single Write. It will work both for GPFS and NFS protocols. For AFM to work well with GPFS protocol, you should have high bandwidth, reliable network. Best Regards, Shankar Balasubramanian STSM, AFM & Async DR Development IBM Systems Bangalore - Embassy Golf Links India From: Luke Raimbach To: gpfsug main discussion list Date: 07/21/2016 10:45 PM Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just keep on coming.... What's the story with Peer Snapshot functionality over GPFS protocol? Will it ever work, or should I crank my nice setup down a notch to NFS? Are peer snapshots supposed to work over NFS for Independent Writer mode AFM caches, or just Single Writers? Like Barry says, new AFM toys are great. Can we have more, please? Cheers, Luke, Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Fri Jul 22 09:36:40 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 22 Jul 2016 08:36:40 +0000 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience In-Reply-To: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> References: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> Message-ID: Hi Ash Our ifcfg files for the bonded interfaces (this applies to GPFS, data and mgmt networks) are set to mode1: BONDING_OPTS="mode=1 miimon=200" If we have ever had a network outage on the ports for these interfaces, apart from pulling a cable for testing when they went in, then I guess we have it setup right as we've never noticed an issue. The specific mode1 was asked for by our networks team. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ashish Thandavan Sent: 21 July 2016 11:26 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience Dear all, Please could anyone be able to point me at specifications required for the GPFS heartbeat network? Are there any figures for latency, jitter, etc that one should be aware of? I also have a related question about resilience. Our three GPFS NSD servers utilize a single network port on each server and communicate heartbeat traffic over a private VLAN. We are looking at improving the resilience of this setup by adding an additional network link on each server (going to a different member of a pair of stacked switches than the existing one) and running the heartbeat network over bonded interfaces on the three servers. Are there any recommendations as to which network bonding type to use? Based on the name alone, Mode 1 (active-backup) appears to be the ideal choice, and I believe the switches do not need any special configuration. However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might be the way to go; this aggregates the two ports and does require the relevant switch ports to be configured to support this. Is there a recommended bonding mode? If anyone here currently uses bonded interfaces for their GPFS heartbeat traffic, may I ask what type of bond have you configured? Have you had any problems with the setup? And more importantly, has it been of use in keeping the cluster up and running in the scenario of one network link going down? Thank you, Regards, Ash -- ------------------------- Ashish Thandavan UNIX Support Computing Officer Department of Computer Science University of Oxford Wolfson Building Parks Road Oxford OX1 3QD Phone: 01865 610733 Email: ashish.thandavan at cs.ox.ac.uk _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ashish.thandavan at cs.ox.ac.uk Fri Jul 22 09:57:02 2016 From: ashish.thandavan at cs.ox.ac.uk (Ashish Thandavan) Date: Fri, 22 Jul 2016 09:57:02 +0100 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience In-Reply-To: References: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> Message-ID: <918adc41-85a0-6012-ddb8-7ef602773455@cs.ox.ac.uk> Hi Richard, Thank you, that is very good to know! Regards, Ash On 22/07/16 09:36, Sobey, Richard A wrote: > Hi Ash > > Our ifcfg files for the bonded interfaces (this applies to GPFS, data and mgmt networks) are set to mode1: > > BONDING_OPTS="mode=1 miimon=200" > > If we have ever had a network outage on the ports for these interfaces, apart from pulling a cable for testing when they went in, then I guess we have it setup right as we've never noticed an issue. The specific mode1 was asked for by our networks team. > > Richard > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ashish Thandavan > Sent: 21 July 2016 11:26 > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience > > Dear all, > > Please could anyone be able to point me at specifications required for the GPFS heartbeat network? Are there any figures for latency, jitter, etc that one should be aware of? > > I also have a related question about resilience. Our three GPFS NSD servers utilize a single network port on each server and communicate heartbeat traffic over a private VLAN. We are looking at improving the resilience of this setup by adding an additional network link on each server (going to a different member of a pair of stacked switches than the existing one) and running the heartbeat network over bonded interfaces on the three servers. Are there any recommendations as to which network bonding type to use? > > Based on the name alone, Mode 1 (active-backup) appears to be the ideal choice, and I believe the switches do not need any special configuration. However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might be the way to go; this aggregates the two ports and does require the relevant switch ports to be configured to support this. > Is there a recommended bonding mode? > > If anyone here currently uses bonded interfaces for their GPFS heartbeat traffic, may I ask what type of bond have you configured? Have you had any problems with the setup? And more importantly, has it been of use in keeping the cluster up and running in the scenario of one network link going down? > > Thank you, > > Regards, > Ash > > > > -- > ------------------------- > Ashish Thandavan > > UNIX Support Computing Officer > Department of Computer Science > University of Oxford > Wolfson Building > Parks Road > Oxford OX1 3QD > > Phone: 01865 610733 > Email: ashish.thandavan at cs.ox.ac.uk > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- ------------------------- Ashish Thandavan UNIX Support Computing Officer Department of Computer Science University of Oxford Wolfson Building Parks Road Oxford OX1 3QD Phone: 01865 610733 Email: ashish.thandavan at cs.ox.ac.uk From mimarsh2 at vt.edu Fri Jul 22 15:39:55 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Fri, 22 Jul 2016 10:39:55 -0400 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience In-Reply-To: <918adc41-85a0-6012-ddb8-7ef602773455@cs.ox.ac.uk> References: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> <918adc41-85a0-6012-ddb8-7ef602773455@cs.ox.ac.uk> Message-ID: Sort of trailing on this thread - Is a bonded active-active 10gig ethernet network enough bandwidth to run data and heartbeat/admin on the same network? I assume it comes down to a question of latency and congestion but would like to hear others' stories. Is anyone doing anything fancy with QOS to make sure admin/heartbeat traffic is not delayed? All of our current clusters use Infiniband for data and mgt traffic, but we are building a cluster that has dual 10gigE to each compute node. The NSD servers have 40gigE connections to the core network where 10gigE switches uplink. On Fri, Jul 22, 2016 at 4:57 AM, Ashish Thandavan < ashish.thandavan at cs.ox.ac.uk> wrote: > Hi Richard, > > Thank you, that is very good to know! > > Regards, > Ash > > > On 22/07/16 09:36, Sobey, Richard A wrote: > >> Hi Ash >> >> Our ifcfg files for the bonded interfaces (this applies to GPFS, data and >> mgmt networks) are set to mode1: >> >> BONDING_OPTS="mode=1 miimon=200" >> >> If we have ever had a network outage on the ports for these interfaces, >> apart from pulling a cable for testing when they went in, then I guess we >> have it setup right as we've never noticed an issue. The specific mode1 was >> asked for by our networks team. >> >> Richard >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org [mailto: >> gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ashish Thandavan >> Sent: 21 July 2016 11:26 >> To: gpfsug-discuss at spectrumscale.org >> Subject: [gpfsug-discuss] GPFS heartbeat network specifications and >> resilience >> >> Dear all, >> >> Please could anyone be able to point me at specifications required for >> the GPFS heartbeat network? Are there any figures for latency, jitter, etc >> that one should be aware of? >> >> I also have a related question about resilience. Our three GPFS NSD >> servers utilize a single network port on each server and communicate >> heartbeat traffic over a private VLAN. We are looking at improving the >> resilience of this setup by adding an additional network link on each >> server (going to a different member of a pair of stacked switches than the >> existing one) and running the heartbeat network over bonded interfaces on >> the three servers. Are there any recommendations as to which network >> bonding type to use? >> >> Based on the name alone, Mode 1 (active-backup) appears to be the ideal >> choice, and I believe the switches do not need any special configuration. >> However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might >> be the way to go; this aggregates the two ports and does require the >> relevant switch ports to be configured to support this. >> Is there a recommended bonding mode? >> >> If anyone here currently uses bonded interfaces for their GPFS heartbeat >> traffic, may I ask what type of bond have you configured? Have you had any >> problems with the setup? And more importantly, has it been of use in >> keeping the cluster up and running in the scenario of one network link >> going down? >> >> Thank you, >> >> Regards, >> Ash >> >> >> >> -- >> ------------------------- >> Ashish Thandavan >> >> UNIX Support Computing Officer >> Department of Computer Science >> University of Oxford >> Wolfson Building >> Parks Road >> Oxford OX1 3QD >> >> Phone: 01865 610733 >> Email: ashish.thandavan at cs.ox.ac.uk >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > ------------------------- > Ashish Thandavan > > UNIX Support Computing Officer > Department of Computer Science > University of Oxford > Wolfson Building > Parks Road > Oxford OX1 3QD > > Phone: 01865 610733 > Email: ashish.thandavan at cs.ox.ac.uk > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chekh at stanford.edu Fri Jul 22 17:25:49 2016 From: chekh at stanford.edu (Alex Chekholko) Date: Fri, 22 Jul 2016 09:25:49 -0700 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience In-Reply-To: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> References: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> Message-ID: <81342a19-1cec-14de-9d7f-176ff7511511@stanford.edu> Hi Ashish, Can you describe more about what problem you are trying to solve? And what failure mode you are trying to avoid? GPFS depends on uninterrupted network access between the cluster members (well, mainly between each cluster member and the current cluster manager node), but there are many ways to ensure that, and many ways to recover from interruptions. e.g. we tend to set minMissedPingTimeout 30 pingPeriod 5 Bump those up if network/system gets busy. Performance and latency will suffer but at least cluster members won't be expelled. Regards, Alex On 07/21/2016 03:26 AM, Ashish Thandavan wrote: > Dear all, > > Please could anyone be able to point me at specifications required for > the GPFS heartbeat network? Are there any figures for latency, jitter, > etc that one should be aware of? > > I also have a related question about resilience. Our three GPFS NSD > servers utilize a single network port on each server and communicate > heartbeat traffic over a private VLAN. We are looking at improving the > resilience of this setup by adding an additional network link on each > server (going to a different member of a pair of stacked switches than > the existing one) and running the heartbeat network over bonded > interfaces on the three servers. Are there any recommendations as to > which network bonding type to use? > > Based on the name alone, Mode 1 (active-backup) appears to be the ideal > choice, and I believe the switches do not need any special > configuration. However, it has been suggested that Mode 4 (802.3ad) or > LACP bonding might be the way to go; this aggregates the two ports and > does require the relevant switch ports to be configured to support this. > Is there a recommended bonding mode? > > If anyone here currently uses bonded interfaces for their GPFS heartbeat > traffic, may I ask what type of bond have you configured? Have you had > any problems with the setup? And more importantly, has it been of use in > keeping the cluster up and running in the scenario of one network link > going down? > > Thank you, > > Regards, > Ash > > > -- Alex Chekholko chekh at stanford.edu From volobuev at us.ibm.com Fri Jul 22 18:56:31 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Fri, 22 Jul 2016 10:56:31 -0700 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: There are multiple ways to accomplish active-active two-side synchronous DR, aka "stretch cluster". The most common approach is to have 3 sites: two main sites A and B, plus tiebreaker site C. The two main sites host all data/metadata disks and each has some even number of quorum nodes. There's no stretched SAN, each site has its own set of NSDs defined. The tiebreaker site consists of a single quorum node with a small descOnly LUN. In this config, any of the 3 sites can do down or be disconnected from the rest without affecting the other two. The tiebreaker site is essential: it provides a quorum node for node majority quorum to function, and a descOnly disk for the file system descriptor quorum. Technically speaking, one do away with the need to have a quorum node at site C by using "minority quorum", i.e. tiebreaker disks, but this model is more complex and it is harder to predict its behavior under various failure conditions. The basic problem with the minority quorum is that it allows a minority of nodes to win in a network partition scenario, just like the name implies. In the extreme case this leads to the "dictator problem", when a single partitioned node could manage to win the disk election and thus kick everyone else out. And since a tiebreaker disk needs to be visible from all quorum nodes, you do need a stretched SAN that extends between sites. The classic active-active stretch cluster only requires a good TCP/IP network. The question that gets asked a lot is "how good should be network connection between sites be". There's no simple answer, unfortunately. It would be completely impractical to try to frame this in simple thresholds. The worse the network connection is, the more pain it produces, but everyone has a different level of pain tolerance. And everyone's workload is different. In any GPFS configuration that uses data replication, writes are impacted far more by replication than reads. So a read-mostly workload may run fine with a dodgy inter-site link, while a write-heavy workload may just run into the ground, as IOs may be submitted faster than they could be completed. The buffering model could make a big difference. An application that does a fair amount of write bursts, with those writes being buffered in a generously sized pagepool, may perform acceptably, while a different application that uses O_SYNC or O_DIRECT semantics for writes may run a lot worse, all other things being equal. As long as all nodes can renew their disk leases within the configured disk lease interval (35 sec by default), GPFS will basically work, so the absolute threshold for the network link quality is not particularly stringent, but beyond that it all depends on your workload and your level of pain tolerance. Practically speaking, you want a network link with low-double-digits RTT at worst, almost no packet loss, and bandwidth commensurate with your application IO needs (fudged some to allow for write amplification -- another factor that's entirely workload-dependent). So a link with, say, 100ms RTT and 2% packet loss is not going to be usable to almost anyone, in my opinion, a link with 30ms RTT and 0.1% packet loss may work for some undemanding read-mostly workloads, and so on. So you pretty much have to try it out to see. The disk configuration is another tricky angle. The simplest approach is to have two groups of data/metadata NSDs, on sites A and B, and not have any sort of SAN reaching across sites. Historically, such a config was actually preferred over a stretched SAN, because it allowed for a basic site topology definition. When multiple replicas of the same logical block are present, it is obviously better/faster to read the replica that resides on a disk that's local to a given site. This is conceptually simple, but how would GPFS know what a site is and what disks are local vs remote? To GPFS, all disks are equal. Historically, the readReplicaPolicy=local config parameter was put forward to work around the problem. The basic idea was: if the reader node is on the same subnet as the primary NSD server for a given replica, this replica is "local", and is thus preferred. This sort of works, but requires a very specific network configuration, which isn't always practical. Starting with GPFS 4.1.1, GPFS implements readReplicaPolicy=fastest, where the best replica for reads is picked based on observed disk IO latency. This is more general and works for all disk topologies, including a stretched SAN. yuri From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list , Date: 07/21/2016 05:45 AM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. Site1 NSD Node1 ---NSD1 ---Physical LUN1 from SAN1 NSD Node2 Site2 NSD Node3 ---NSD2 ?Physical LUN2 from SAN2 NSD Node4 Or Site1 NSD Node1 ----NSD1 ?Physical LUN1 from SAN1 ----NSD2 ?Physical LUN2 from SAN2 NSD Node2 Site 2 NSD Node3 ---NSD2 ? Physical LUN2 from SAN2 ---NSD1 --Physical LUN1 from SAN1 NSD Node4 Site 3 Node5 Quorum From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 7:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11130580.gif Type: image/gif Size: 1621 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11971715.gif Type: image/gif Size: 1597 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11264118.gif Type: image/gif Size: 1072 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11128019.gif Type: image/gif Size: 979 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11470612.gif Type: image/gif Size: 1564 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11952793.gif Type: image/gif Size: 1313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11488202.gif Type: image/gif Size: 1168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11248852.gif Type: image/gif Size: 1426 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11036495.gif Type: image/gif Size: 1369 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11647743.gif Type: image/gif Size: 1244 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11125683.gif Type: image/gif Size: 4454 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11353219.gif Type: image/gif Size: 1622 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11280235.gif Type: image/gif Size: 1598 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11669375.gif Type: image/gif Size: 1073 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11650693.gif Type: image/gif Size: 980 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11604766.gif Type: image/gif Size: 1565 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11840270.gif Type: image/gif Size: 1314 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11842186.gif Type: image/gif Size: 1169 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11809831.gif Type: image/gif Size: 1427 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11549547.gif Type: image/gif Size: 1370 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11344792.gif Type: image/gif Size: 1245 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11830257.gif Type: image/gif Size: 4455 bytes Desc: not available URL: From dhildeb at us.ibm.com Fri Jul 22 19:00:23 2016 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Fri, 22 Jul 2016 11:00:23 -0700 Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol In-Reply-To: References: Message-ID: Just to expand a bit on the use of peer snapshots. The point of psnap is to create a snapshot in the cache that is identical to a snapshot on the home. This way you can recover files from a snapshot of a fileset on the 'replica' of the data just like you can from a snapshot in the 'cache' (where the data was generated). With IW mode, its typically possible that the data could be changing on the home from another cache or clients directly running on the data on the home. In this case, it would be impossible to ensure that the snapshots in the cache and on the home are identical. Dean From: "Shankar Balasubramanian" To: gpfsug main discussion list Date: 07/21/2016 05:52 PM Subject: Re: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Peer snapshots will not work for independent writers as they are purely meant for Single Write. It will work both for GPFS and NFS protocols. For AFM to work well with GPFS protocol, you should have high bandwidth, reliable network. Best Regards, Shankar Balasubramanian STSM, AFM & Async DR Development IBM Systems Bangalore - Embassy Golf Links India Inactive hide details for Luke Raimbach ---07/21/2016 10:45:51 PM---Dear GPFS Experts, I've worn my brain out today with AFM anLuke Raimbach ---07/21/2016 10:45:51 PM---Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just k From: Luke Raimbach To: gpfsug main discussion list Date: 07/21/2016 10:45 PM Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just keep on coming.... What's the story with Peer Snapshot functionality over GPFS protocol? Will it ever work, or should I crank my nice setup down a notch to NFS? Are peer snapshots supposed to work over NFS for Independent Writer mode AFM caches, or just Single Writers? Like Barry says, new AFM toys are great. Can we have more, please? Cheers, Luke, Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From volobuev at us.ibm.com Fri Jul 22 20:24:58 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Fri, 22 Jul 2016 12:24:58 -0700 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> Message-ID: In a word, no. I can't blame anyone for suspecting that there's yet another hidden flag somewhere, given our track record, but there's nothing hidden on this one, there's just no code to implement O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be a reasonable thing to have, so if you feel strongly enough about it to open an RFE, go for it. yuri From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" To: gpfsug main discussion list , Date: 07/21/2016 09:05 AM Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in particular the putacl and getacl functions) have no support for not following symlinks. Is there some hidden support for gpfs_putacl that will cause it to not deteference symbolic links? Something like the O_NOFOLLOW flag used elsewhere in linux? Thanks! -Aaron_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Fri Jul 22 23:36:46 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 22 Jul 2016 18:36:46 -0400 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> Message-ID: <9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> Thanks Yuri! I do wonder what security implications this might have for the policy engine where a nefarious user could trick it into performing an action on another file via symlink hijacking. Truthfully I've been more worried about an accidental hijack rather than someone being malicious. I'll open an RFE for it since I think it would be nice to have. (While I'm at it, I think I'll open another for having chown call exposed via the API). -Aaron On 7/22/16 3:24 PM, Yuri L Volobuev wrote: > In a word, no. I can't blame anyone for suspecting that there's yet > another hidden flag somewhere, given our track record, but there's > nothing hidden on this one, there's just no code to implement > O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be > a reasonable thing to have, so if you feel strongly enough about it to > open an RFE, go for it. > > yuri > > Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[COMPUTER > SCIENCE CORP]" ---07/21/2016 09:05:11 AM---Hi Everyone, I've"Knister, > Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" ---07/21/2016 09:05:11 > AM---Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) > and API calls (in particular the > > From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > > To: gpfsug main discussion list , > Date: 07/21/2016 09:05 AM > Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > Hi Everyone, > > I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in > particular the putacl and getacl functions) have no support for not > following symlinks. Is there some hidden support for gpfs_putacl that > will cause it to not deteference symbolic links? Something like the > O_NOFOLLOW flag used elsewhere in linux? > > Thanks! > > -Aaron_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: OpenPGP digital signature URL: From aaron.s.knister at nasa.gov Sat Jul 23 05:46:30 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sat, 23 Jul 2016 00:46:30 -0400 Subject: [gpfsug-discuss] inode update delay? Message-ID: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> I've noticed that there can be a several minute delay between the time changes to an inode occur and when those changes are reflected in the results of an inode scan. I've been working on code that checks ia_xperm to determine if a given file has extended acl entries and noticed in testing it that the acl flag wasn't getting set immediately after giving a file an acl. Here's what I mean: # cd /gpfsm/dnb32 # date; setfacl -b acltest* Sat Jul 23 00:24:57 EDT 2016 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:24:59 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:10 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:21 EDT 2016 0 I'm a little confused about what's going on here-- is there some kind of write-behind for inode updates? Is there a way I can cause the cluster to quiesce and flush all pending inode updates (an mmfsctl suspend and resume seem to have this effect but I was looking for something a little less user-visible)? If I access the directory containing the files from another node via the VFS mount then the update appears immediately in the inode scan. A mere inode scan from another node w/o touching the filesystem mount doesn't necessarily seem to trigger this behavior. Thanks! -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From shankbal at in.ibm.com Fri Jul 22 08:53:51 2016 From: shankbal at in.ibm.com (Shankar Balasubramanian) Date: Fri, 22 Jul 2016 13:23:51 +0530 Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol In-Reply-To: References: Message-ID: One correction to the note below, peer snapshots are not supported when AFM use GPFS protocol. Best Regards, Shankar Balasubramanian STSM, AFM & Async DR Development IBM Systems Bangalore - Embassy Golf Links India From: Shankar Balasubramanian/India/IBM at IBMIN To: gpfsug main discussion list Date: 07/22/2016 06:22 AM Subject: Re: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Peer snapshots will not work for independent writers as they are purely meant for Single Write. It will work both for GPFS and NFS protocols. For AFM to work well with GPFS protocol, you should have high bandwidth, reliable network. Best Regards, Shankar Balasubramanian STSM, AFM & Async DR Development IBM Systems Bangalore - Embassy Golf Links India Inactive hide details for Luke Raimbach ---07/21/2016 10:45:51 PM---Dear GPFS Experts, I've worn my brain out today with AFM anLuke Raimbach ---07/21/2016 10:45:51 PM---Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just k From: Luke Raimbach To: gpfsug main discussion list Date: 07/21/2016 10:45 PM Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just keep on coming.... What's the story with Peer Snapshot functionality over GPFS protocol? Will it ever work, or should I crank my nice setup down a notch to NFS? Are peer snapshots supposed to work over NFS for Independent Writer mode AFM caches, or just Single Writers? Like Barry says, new AFM toys are great. Can we have more, please? Cheers, Luke, Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From MKEIGO at jp.ibm.com Sun Jul 24 03:31:05 2016 From: MKEIGO at jp.ibm.com (Keigo Matsubara) Date: Sun, 24 Jul 2016 11:31:05 +0900 Subject: [gpfsug-discuss] inode update delay? In-Reply-To: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> References: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> Message-ID: Hi Aaron, I think the product is designed so that some inode fields are not propagated among nodes instantly in order to avoid unnecessary overhead within the cluster. See: Exceptions to Open Group technical standards - IBM Spectrum Scale: Administration and Programming Reference - IBM Spectrum Scale 4.2 - IBM Knowledge Center https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_xopen.htm --- Keigo Matsubara, Industry Architect, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 From: Aaron Knister To: Date: 2016/07/23 13:47 Subject: [gpfsug-discuss] inode update delay? Sent by: gpfsug-discuss-bounces at spectrumscale.org I've noticed that there can be a several minute delay between the time changes to an inode occur and when those changes are reflected in the results of an inode scan. I've been working on code that checks ia_xperm to determine if a given file has extended acl entries and noticed in testing it that the acl flag wasn't getting set immediately after giving a file an acl. Here's what I mean: # cd /gpfsm/dnb32 # date; setfacl -b acltest* Sat Jul 23 00:24:57 EDT 2016 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:24:59 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:10 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:21 EDT 2016 0 I'm a little confused about what's going on here-- is there some kind of write-behind for inode updates? Is there a way I can cause the cluster to quiesce and flush all pending inode updates (an mmfsctl suspend and resume seem to have this effect but I was looking for something a little less user-visible)? If I access the directory containing the files from another node via the VFS mount then the update appears immediately in the inode scan. A mere inode scan from another node w/o touching the filesystem mount doesn't necessarily seem to trigger this behavior. Thanks! -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stef.coene at docum.org Sun Jul 24 11:27:28 2016 From: stef.coene at docum.org (Stef Coene) Date: Sun, 24 Jul 2016 12:27:28 +0200 Subject: [gpfsug-discuss] New to GPFS Message-ID: <57949810.2030002@docum.org> Hi, Like the subject says, I'm new to Spectrum Scale. We are considering GPFS as back end for CommVault back-up data. Back-end storage will be iSCSI (300 TB) and V5000 SAS (100 TB). I created a 2 node cluster (RHEL) with 2 protocol nodes and 1 client (Ubuntu) as test on ESXi 6. The RHEL servers are upgraded to 7.2. Will that be a problem or not? I saw some posts that there is an issue with RHEL 7.2.... Stef From makaplan at us.ibm.com Sun Jul 24 16:11:06 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 24 Jul 2016 11:11:06 -0400 Subject: [gpfsug-discuss] inode update delay? / mmapplypolicy In-Reply-To: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> References: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> Message-ID: mmapplypolicy uses the inodescan API which to gain overall speed, bypasses various buffers, caches, locks, ... and just reads inodes "directly" from disk. So the "view" of inodescan is somewhat "behind" the overall state of the live filesystem as viewed from the usual Posix APIs, such as stat(2). (Not to worry, all metadata updates are logged, so in event of a power loss or OS crash, GPFS recovers a consistent state from its log files...) This is at least mentioned in the docs. `mmfsctl suspend-write; mmfsctl resume;` is the only practical way I know to guarantee a forced a flush of all "dirty" buffers to disk -- any metadata updates before the suspend will for sure become visible to an inodescan after the resume. (Classic `sync` is not quite the same...) But think about this --- scanning a "live" file system is always somewhat iffy-dodgy and the result is smeared over the time of the scan -- if there are any concurrent changes during the scan your results are imprecise. An alternative is to use `mmcrsnapshot` and scan the snapshot. From: Aaron Knister To: Date: 07/23/2016 12:46 AM Subject: [gpfsug-discuss] inode update delay? Sent by: gpfsug-discuss-bounces at spectrumscale.org I've noticed that there can be a several minute delay between the time changes to an inode occur and when those changes are reflected in the results of an inode scan. I've been working on code that checks ia_xperm to determine if a given file has extended acl entries and noticed in testing it that the acl flag wasn't getting set immediately after giving a file an acl. Here's what I mean: # cd /gpfsm/dnb32 # date; setfacl -b acltest* Sat Jul 23 00:24:57 EDT 2016 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:24:59 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:10 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:21 EDT 2016 0 I'm a little confused about what's going on here-- is there some kind of write-behind for inode updates? Is there a way I can cause the cluster to quiesce and flush all pending inode updates (an mmfsctl suspend and resume seem to have this effect but I was looking for something a little less user-visible)? If I access the directory containing the files from another node via the VFS mount then the update appears immediately in the inode scan. A mere inode scan from another node w/o touching the filesystem mount doesn't necessarily seem to trigger this behavior. Thanks! -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Sun Jul 24 16:54:16 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 24 Jul 2016 11:54:16 -0400 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: <9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> <9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> Message-ID: Regarding "policy engine"/inodescan and symbolic links. 1. Either the MODE and MISC_ATTRIBUTES properties (SQL variables) can be tested to see if an inode/file is a symlink or not. 2. Default behaviour for mmapplypolicy is to skip over symlinks. You must specify... DIRECTORIES_PLUS which ... Indicates that non-regular file objects (directories, symbolic links, and so on) should be included in the list. If not specified, only ordinary data files are included in the candidate lists. 3. You can apply Linux commands and APIs to GPFS pathnames. 4. Of course, if you need to get at a GPFS feature or attribute that is not supported by Linux ... 5. Hmmm... on my Linux system `setfacl -P ...` does not follow symlinks, but neither does it set the ACL for the symlink... Googling... some people consider this to be a bug, but maybe it is a feature... --marc From: Aaron Knister To: Date: 07/22/2016 06:37 PM Subject: Re: [gpfsug-discuss] GPFS API O_NOFOLLOW support Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Yuri! I do wonder what security implications this might have for the policy engine where a nefarious user could trick it into performing an action on another file via symlink hijacking. Truthfully I've been more worried about an accidental hijack rather than someone being malicious. I'll open an RFE for it since I think it would be nice to have. (While I'm at it, I think I'll open another for having chown call exposed via the API). -Aaron On 7/22/16 3:24 PM, Yuri L Volobuev wrote: > In a word, no. I can't blame anyone for suspecting that there's yet > another hidden flag somewhere, given our track record, but there's > nothing hidden on this one, there's just no code to implement > O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be > a reasonable thing to have, so if you feel strongly enough about it to > open an RFE, go for it. > > yuri > > Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[COMPUTER > SCIENCE CORP]" ---07/21/2016 09:05:11 AM---Hi Everyone, I've"Knister, > Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" ---07/21/2016 09:05:11 > AM---Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) > and API calls (in particular the > > From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > > To: gpfsug main discussion list , > Date: 07/21/2016 09:05 AM > Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > Hi Everyone, > > I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in > particular the putacl and getacl functions) have no support for not > following symlinks. Is there some hidden support for gpfs_putacl that > will cause it to not deteference symbolic links? Something like the > O_NOFOLLOW flag used elsewhere in linux? > > Thanks! > > -Aaron_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 [attachment "signature.asc" deleted by Marc A Kaplan/Watson/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Mon Jul 25 00:15:02 2016 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Sun, 24 Jul 2016 23:15:02 +0000 Subject: [gpfsug-discuss] New to GPFS In-Reply-To: <57949810.2030002@docum.org> References: <57949810.2030002@docum.org> Message-ID: <43489850f79c446c9d9896292608a292@exch1-cdc.nexus.csiro.au> The issue is with the Protocols version of GPFS. I am using the non-protocols version 4.2.0.3 successfully on CentOS 7.2. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Stef Coene Sent: Sunday, 24 July 2016 8:27 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] New to GPFS Hi, Like the subject says, I'm new to Spectrum Scale. We are considering GPFS as back end for CommVault back-up data. Back-end storage will be iSCSI (300 TB) and V5000 SAS (100 TB). I created a 2 node cluster (RHEL) with 2 protocol nodes and 1 client (Ubuntu) as test on ESXi 6. The RHEL servers are upgraded to 7.2. Will that be a problem or not? I saw some posts that there is an issue with RHEL 7.2.... Stef _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mweil at wustl.edu Mon Jul 25 15:56:52 2016 From: mweil at wustl.edu (Matt Weil) Date: Mon, 25 Jul 2016 09:56:52 -0500 Subject: [gpfsug-discuss] New to GPFS In-Reply-To: <57949810.2030002@docum.org> References: <57949810.2030002@docum.org> Message-ID: On 7/24/16 5:27 AM, Stef Coene wrote: > Hi, > > Like the subject says, I'm new to Spectrum Scale. > > We are considering GPFS as back end for CommVault back-up data. > Back-end storage will be iSCSI (300 TB) and V5000 SAS (100 TB). > I created a 2 node cluster (RHEL) with 2 protocol nodes and 1 client > (Ubuntu) as test on ESXi 6. > > The RHEL servers are upgraded to 7.2. Will that be a problem or not? > I saw some posts that there is an issue with RHEL 7.2.... we had to upgrade to 4.2.0.3 when running RHEL 7.2 > > > Stef > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From aaron.s.knister at nasa.gov Mon Jul 25 20:50:54 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 25 Jul 2016 15:50:54 -0400 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> <9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> Message-ID: <90c6b452-2df4-b217-eaaf-2f991bb2921e@nasa.gov> Thanks Marc. In my mind the issue is a timing one between the moment the policy engine decides to perform an action on a file (e.g. matching the path inode/gen number with that from the inode scan) and when it actually takes that action by calling an api call that takes a path as an argument. Your suggestion in #3 is the route I think I'm going to take here since I can call acl_get_fd after calling open/openat with O_NOFOLLOW. On 7/24/16 11:54 AM, Marc A Kaplan wrote: > Regarding "policy engine"/inodescan and symbolic links. > > 1. Either the MODE and MISC_ATTRIBUTES properties (SQL variables) can be > tested to see if an inode/file is a symlink or not. > > 2. Default behaviour for mmapplypolicy is to skip over symlinks. You > must specify... > > *DIRECTORIES_PLUS which ...* > > Indicates that non-regular file objects (directories, symbolic links, > and so on) should be included in > the list. If not specified, only ordinary data files are included in the > candidate lists. > > 3. You can apply Linux commands and APIs to GPFS pathnames. > > 4. Of course, if you need to get at a GPFS feature or attribute that is > not supported by Linux ... > > 5. Hmmm... on my Linux system `setfacl -P ...` does not follow symlinks, > but neither does it set the ACL for the symlink... > Googling... some people consider this to be a bug, but maybe it is a > feature... > > --marc > > > > From: Aaron Knister > To: > Date: 07/22/2016 06:37 PM > Subject: Re: [gpfsug-discuss] GPFS API O_NOFOLLOW support > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Thanks Yuri! I do wonder what security implications this might have for > the policy engine where a nefarious user could trick it into performing > an action on another file via symlink hijacking. Truthfully I've been > more worried about an accidental hijack rather than someone being > malicious. I'll open an RFE for it since I think it would be nice to > have. (While I'm at it, I think I'll open another for having chown call > exposed via the API). > > -Aaron > > On 7/22/16 3:24 PM, Yuri L Volobuev wrote: >> In a word, no. I can't blame anyone for suspecting that there's yet >> another hidden flag somewhere, given our track record, but there's >> nothing hidden on this one, there's just no code to implement >> O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be >> a reasonable thing to have, so if you feel strongly enough about it to >> open an RFE, go for it. >> >> yuri >> >> Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[COMPUTER >> SCIENCE CORP]" ---07/21/2016 09:05:11 AM---Hi Everyone, I've"Knister, >> Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" ---07/21/2016 09:05:11 >> AM---Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) >> and API calls (in particular the >> >> From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" >> >> To: gpfsug main discussion list , >> Date: 07/21/2016 09:05 AM >> Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> ------------------------------------------------------------------------ >> >> >> >> Hi Everyone, >> >> I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in >> particular the putacl and getacl functions) have no support for not >> following symlinks. Is there some hidden support for gpfs_putacl that >> will cause it to not deteference symbolic links? Something like the >> O_NOFOLLOW flag used elsewhere in linux? >> >> Thanks! >> >> -Aaron_______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > [attachment "signature.asc" deleted by Marc A Kaplan/Watson/IBM] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Mon Jul 25 20:57:25 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 25 Jul 2016 15:57:25 -0400 Subject: [gpfsug-discuss] inode update delay? / mmapplypolicy In-Reply-To: References: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> Message-ID: <291e1237-98d6-2abe-b1af-8898da61629f@nasa.gov> Thanks again, Marc. You're quite right about the results being smeared over time on a live filesystem even if the inodescan didn't lag behind slightly. The use case here is a mass uid number migration. File ownership is easy because I can be guaranteed after a certain point in time that no new files under the user's old uid number can be created. However, in part because of inheritance I'm not so lucky when it comes to ACLs. I almost need to do 2 passes when looking at the ACLs but even that's not guaranteed to catch everything. Using a snapshot is an interesting idea to give me a stable point in time snapshot to determine if I got everything. -Aaron On 7/24/16 11:11 AM, Marc A Kaplan wrote: > mmapplypolicy uses the inodescan API which to gain overall speed, > bypasses various buffers, caches, locks, ... and just reads inodes > "directly" from disk. > > So the "view" of inodescan is somewhat "behind" the overall state of the > live filesystem as viewed from the usual Posix APIs, such as stat(2). > (Not to worry, all metadata updates are logged, so in event of a power > loss or OS crash, GPFS recovers a consistent state from its log files...) > > This is at least mentioned in the docs. > > `mmfsctl suspend-write; mmfsctl resume;` is the only practical way I > know to guarantee a forced a flush of all "dirty" buffers to disk -- any > metadata updates before the suspend will for sure > become visible to an inodescan after the resume. (Classic `sync` is not > quite the same...) > > But think about this --- scanning a "live" file system is always > somewhat iffy-dodgy and the result is smeared over the time of the scan > -- if there are any concurrent changes > during the scan your results are imprecise. > > An alternative is to use `mmcrsnapshot` and scan the snapshot. > > > > > From: Aaron Knister > To: > Date: 07/23/2016 12:46 AM > Subject: [gpfsug-discuss] inode update delay? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > I've noticed that there can be a several minute delay between the time > changes to an inode occur and when those changes are reflected in the > results of an inode scan. I've been working on code that checks ia_xperm > to determine if a given file has extended acl entries and noticed in > testing it that the acl flag wasn't getting set immediately after giving > a file an acl. Here's what I mean: > > # cd /gpfsm/dnb32 > > # date; setfacl -b acltest* > Sat Jul 23 00:24:57 EDT 2016 > > # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l > Sat Jul 23 00:24:59 EDT 2016 > 5 > > # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l > Sat Jul 23 00:25:10 EDT 2016 > 5 > > # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l > Sat Jul 23 00:25:21 EDT 2016 > 0 > > I'm a little confused about what's going on here-- is there some kind of > write-behind for inode updates? Is there a way I can cause the cluster > to quiesce and flush all pending inode updates (an mmfsctl suspend and > resume seem to have this effect but I was looking for something a little > less user-visible)? If I access the directory containing the files from > another node via the VFS mount then the update appears immediately in > the inode scan. A mere inode scan from another node w/o touching the > filesystem mount doesn't necessarily seem to trigger this behavior. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From makaplan at us.ibm.com Mon Jul 25 22:46:01 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 25 Jul 2016 17:46:01 -0400 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: <90c6b452-2df4-b217-eaaf-2f991bb2921e@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov><9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> <90c6b452-2df4-b217-eaaf-2f991bb2921e@nasa.gov> Message-ID: Unfortunately there is always a window of time between testing the file and acting on the file's pathname. At any moment after testing (finding) ... the file could change, or the same pathname could be pointing to a different inode/file. That is a potential problem with just about every Unix file utility and/or script you put together with the standard commands... find ... | xargs ... mmapplypolicy has the -e option to narrow the window by retesting just before executing an action. Of course it's seldom a real problem -- you have to think about scenarios where two minds are working within the same namespace of files and then they are doing so either carelessly without communicating or one is deliberately trying to cause trouble for the other! From: Aaron Knister To: Date: 07/25/2016 03:51 PM Subject: Re: [gpfsug-discuss] GPFS API O_NOFOLLOW support Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Marc. In my mind the issue is a timing one between the moment the policy engine decides to perform an action on a file (e.g. matching the path inode/gen number with that from the inode scan) and when it actually takes that action by calling an api call that takes a path as an argument. Your suggestion in #3 is the route I think I'm going to take here since I can call acl_get_fd after calling open/openat with O_NOFOLLOW. On 7/24/16 11:54 AM, Marc A Kaplan wrote: > Regarding "policy engine"/inodescan and symbolic links. > > 1. Either the MODE and MISC_ATTRIBUTES properties (SQL variables) can be > tested to see if an inode/file is a symlink or not. > > 2. Default behaviour for mmapplypolicy is to skip over symlinks. You > must specify... > > *DIRECTORIES_PLUS which ...* > > Indicates that non-regular file objects (directories, symbolic links, > and so on) should be included in > the list. If not specified, only ordinary data files are included in the > candidate lists. > > 3. You can apply Linux commands and APIs to GPFS pathnames. > > 4. Of course, if you need to get at a GPFS feature or attribute that is > not supported by Linux ... > > 5. Hmmm... on my Linux system `setfacl -P ...` does not follow symlinks, > but neither does it set the ACL for the symlink... > Googling... some people consider this to be a bug, but maybe it is a > feature... > > --marc > > > > From: Aaron Knister > To: > Date: 07/22/2016 06:37 PM > Subject: Re: [gpfsug-discuss] GPFS API O_NOFOLLOW support > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Thanks Yuri! I do wonder what security implications this might have for > the policy engine where a nefarious user could trick it into performing > an action on another file via symlink hijacking. Truthfully I've been > more worried about an accidental hijack rather than someone being > malicious. I'll open an RFE for it since I think it would be nice to > have. (While I'm at it, I think I'll open another for having chown call > exposed via the API). > > -Aaron > > On 7/22/16 3:24 PM, Yuri L Volobuev wrote: >> In a word, no. I can't blame anyone for suspecting that there's yet >> another hidden flag somewhere, given our track record, but there's >> nothing hidden on this one, there's just no code to implement >> O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be >> a reasonable thing to have, so if you feel strongly enough about it to >> open an RFE, go for it. >> >> yuri >> >> Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[COMPUTER >> SCIENCE CORP]" ---07/21/2016 09:05:11 AM---Hi Everyone, I've"Knister, >> Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" ---07/21/2016 09:05:11 >> AM---Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) >> and API calls (in particular the >> >> From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" >> >> To: gpfsug main discussion list , >> Date: 07/21/2016 09:05 AM >> Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> ------------------------------------------------------------------------ >> >> >> >> Hi Everyone, >> >> I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in >> particular the putacl and getacl functions) have no support for not >> following symlinks. Is there some hidden support for gpfs_putacl that >> will cause it to not deteference symbolic links? Something like the >> O_NOFOLLOW flag used elsewhere in linux? >> >> Thanks! >> >> -Aaron_______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > [attachment "signature.asc" deleted by Marc A Kaplan/Watson/IBM] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From Luke.Raimbach at crick.ac.uk Tue Jul 26 15:17:35 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Tue, 26 Jul 2016 14:17:35 +0000 Subject: [gpfsug-discuss] AFM Crashing the MDS Message-ID: Hi All, Anyone seen GPFS barf like this before? I'll explain the setup: RO AFM cache on remote site (cache A) for reading remote datasets quickly, LU AFM cache at destination site (cache B) for caching data from cache A (has a local compute cluster mounting this over multi-cluster), IW AFM cache at destination site (cache C) for presenting cache B over NAS protocols, Reading files in cache C should pull data from the remote source through cache A->B->C Modifying files in cache C should pull data into cache B and then break the cache relationship for that file, converting it to a local copy. Those modifications should include metadata updates (e.g. chown). To speed things up we prefetch files into cache B for datasets which are undergoing migration and have entered a read-only state at the source. When issuing chown on a directory in cache C containing ~4.5million files, the MDS for the AFM cache C crashes badly: Tue Jul 26 13:28:52.487 2016: [X] logAssertFailed: addr.isReserved() || addr.getClusterIdx() == clusterIdx Tue Jul 26 13:28:52.488 2016: [X] return code 0, reason code 1, log record tag 0 Tue Jul 26 13:28:53.392 2016: [X] *** Assert exp(addr.isReserved() || addr.getClusterIdx() == clusterIdx) in line 1936 of file /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h Tue Jul 26 13:28:53.393 2016: [E] *** Traceback: Tue Jul 26 13:28:53.394 2016: [E] 2:0x7F6DC95444A6 logAssertFailed + 0x2D6 at ??:0 Tue Jul 26 13:28:53.395 2016: [E] 3:0x7F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 0x4B4 at ??:0 Tue Jul 26 13:28:53.396 2016: [E] 4:0x7F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 0x91 at ??:0 Tue Jul 26 13:28:53.397 2016: [E] 5:0x7F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 0x346 at ??:0 Tue Jul 26 13:28:53.398 2016: [E] 6:0x7F6DC9332494 HandleMBPcache(MBPcacheParms*) + 0xB4 at ??:0 Tue Jul 26 13:28:53.399 2016: [E] 7:0x7F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 0x3C3 at ??:0 Tue Jul 26 13:28:53.400 2016: [E] 8:0x7F6DC908BC06 Thread::callBody(Thread*) + 0x46 at ??:0 Tue Jul 26 13:28:53.401 2016: [E] 9:0x7F6DC907A0D2 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 Tue Jul 26 13:28:53.402 2016: [E] 10:0x7F6DC87A3AA1 start_thread + 0xD1 at ??:0 Tue Jul 26 13:28:53.403 2016: [E] 11:0x7F6DC794A93D clone + 0x6D at ??:0 mmfsd: /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h:1936: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion `addr.isReserved() || addr.getClusterIdx() == clusterIdx' failed. Tue Jul 26 13:28:53.404 2016: [N] Signal 6 at location 0x7F6DC7894625 in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:53.405 2016: [I] rax 0x0000000000000000 rbx 0x00007F6DC8DCB000 Tue Jul 26 13:28:53.406 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx 0x0000000000000006 Tue Jul 26 13:28:53.407 2016: [I] rsp 0x00007F6DAAEA01F8 rbp 0x00007F6DCA05C8B0 Tue Jul 26 13:28:53.408 2016: [I] rsi 0x00000000000018F8 rdi 0x0000000000001876 Tue Jul 26 13:28:53.409 2016: [I] r8 0xFEFEFEFEFEFEFEFF r9 0xFEFEFEFEFF092D63 Tue Jul 26 13:28:53.410 2016: [I] r10 0x0000000000000008 r11 0x0000000000000202 Tue Jul 26 13:28:53.411 2016: [I] r12 0x00007F6DC9FC5540 r13 0x00007F6DCA05C1C0 Tue Jul 26 13:28:53.412 2016: [I] r14 0x0000000000000000 r15 0x0000000000000000 Tue Jul 26 13:28:53.413 2016: [I] rip 0x00007F6DC7894625 eflags 0x0000000000000202 Tue Jul 26 13:28:53.414 2016: [I] csgsfs 0x0000000000000033 err 0x0000000000000000 Tue Jul 26 13:28:53.415 2016: [I] trapno 0x0000000000000000 oldmsk 0x0000000010017807 Tue Jul 26 13:28:53.416 2016: [I] cr2 0x0000000000000000 Tue Jul 26 13:28:54.225 2016: [D] Traceback: Tue Jul 26 13:28:54.226 2016: [D] 0:00007F6DC7894625 raise + 35 at ??:0 Tue Jul 26 13:28:54.227 2016: [D] 1:00007F6DC7895E05 abort + 175 at ??:0 Tue Jul 26 13:28:54.228 2016: [D] 2:00007F6DC788D74E __assert_fail_base + 11E at ??:0 Tue Jul 26 13:28:54.229 2016: [D] 3:00007F6DC788D810 __assert_fail + 50 at ??:0 Tue Jul 26 13:28:54.230 2016: [D] 4:00007F6DC95444CA logAssertFailed + 2FA at ??:0 Tue Jul 26 13:28:54.231 2016: [D] 5:00007F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 4B4 at ??:0 Tue Jul 26 13:28:54.232 2016: [D] 6:00007F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 91 at ??:0 Tue Jul 26 13:28:54.233 2016: [D] 7:00007F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 346 at ??:0 Tue Jul 26 13:28:54.234 2016: [D] 8:00007F6DC9332494 HandleMBPcache(MBPcacheParms*) + B4 at ??:0 Tue Jul 26 13:28:54.235 2016: [D] 9:00007F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 3C3 at ??:0 Tue Jul 26 13:28:54.236 2016: [D] 10:00007F6DC908BC06 Thread::callBody(Thread*) + 46 at ??:0 Tue Jul 26 13:28:54.237 2016: [D] 11:00007F6DC907A0D2 Thread::callBodyWrapper(Thread*) + A2 at ??:0 Tue Jul 26 13:28:54.238 2016: [D] 12:00007F6DC87A3AA1 start_thread + D1 at ??:0 Tue Jul 26 13:28:54.239 2016: [D] 13:00007F6DC794A93D clone + 6D at ??:0 Tue Jul 26 13:28:54.240 2016: [N] Restarting mmsdrserv Tue Jul 26 13:28:55.535 2016: [N] Signal 6 at location 0x7F6DC790EA7D in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:55.536 2016: [N] mmfsd is shutting down. Tue Jul 26 13:28:55.537 2016: [N] Reason for shutdown: Signal handler entered Tue Jul 26 13:28:55 BST 2016: mmcommon mmfsdown invoked. Subsystem: mmfs Status: active Tue Jul 26 13:28:55 BST 2016: /var/mmfs/etc/mmfsdown invoked umount2: Device or resource busy umount: /camp: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy umount: /ingest: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Shutting down NFS daemon: [ OK ] Shutting down NFS mountd: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Shutting down RPC idmapd: [ OK ] Stopping NFS statd: [ OK ] Ugly, right? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From bbanister at jumptrading.com Wed Jul 27 18:37:37 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 27 Jul 2016 17:37:37 +0000 Subject: [gpfsug-discuss] CCR troubles Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I'll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Jul 27 19:03:05 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 27 Jul 2016 14:03:05 -0400 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org)" Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Jul 27 23:29:19 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 27 Jul 2016 22:29:19 +0000 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> Hi Marc, I do understand the principal you describe. The quorum nodes are accessible over TCP/IP but GPFS happens to be down. I think that CCR should would work regardless of whether GPFS is up or down, so that you can change the configuration on a down cluster. I could even imagine a scenario where a config parameter was set incorrectly and prevents GPFS from starting at all. If you have to have GPFS up to make config changes because of CCR then how can you fix this issue? Thanks for the response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Wednesday, July 27, 2016 1:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister > To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org)" > Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmcpheeters at anl.gov Wed Jul 27 23:34:50 2016 From: gmcpheeters at anl.gov (McPheeters, Gordon) Date: Wed, 27 Jul 2016 22:34:50 +0000 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> mmchcluster has an option: ??ccr?disable Reverts to the traditional primary or backup configuration server semantics and destroys the CCR environment. All nodes must be shut down before disabling CCR. -Gordon On Jul 27, 2016, at 5:29 PM, Bryan Banister > wrote: Hi Marc, I do understand the principal you describe. The quorum nodes are accessible over TCP/IP but GPFS happens to be down. I think that CCR should would work regardless of whether GPFS is up or down, so that you can change the configuration on a down cluster. I could even imagine a scenario where a config parameter was set incorrectly and prevents GPFS from starting at all. If you have to have GPFS up to make config changes because of CCR then how can you fix this issue? Thanks for the response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Wednesday, July 27, 2016 1:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister > To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org)" > Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Jul 27 23:44:27 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 27 Jul 2016 22:44:27 +0000 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> Right, I know that I can disable CCR, and I?m asking if this seemingly broken behavior of GPFS commands when the cluster is down was the expected mode of operation with CCR enabled. Sounds like it from the responses thus far. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of McPheeters, Gordon Sent: Wednesday, July 27, 2016 5:35 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles mmchcluster has an option: ??ccr?disable Reverts to the traditional primary or backup configuration server semantics and destroys the CCR environment. All nodes must be shut down before disabling CCR. -Gordon On Jul 27, 2016, at 5:29 PM, Bryan Banister > wrote: Hi Marc, I do understand the principal you describe. The quorum nodes are accessible over TCP/IP but GPFS happens to be down. I think that CCR should would work regardless of whether GPFS is up or down, so that you can change the configuration on a down cluster. I could even imagine a scenario where a config parameter was set incorrectly and prevents GPFS from starting at all. If you have to have GPFS up to make config changes because of CCR then how can you fix this issue? Thanks for the response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Wednesday, July 27, 2016 1:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister > To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org)" > Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsanjay at us.ibm.com Thu Jul 28 00:04:35 2016 From: gsanjay at us.ibm.com (Sanjay Gandhi) Date: Wed, 27 Jul 2016 16:04:35 -0700 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 54, Issue 63 In-Reply-To: References: Message-ID: Check mmsdrserv is running on all quorum nodes. mmlscluster should start mmsdrserv if it is not running. Thanks, Sanjay Gandhi GPFS FVT IBM, Beaverton Phone/FAX : 503-578-4141 T/L 775-4141 gsanjay at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 07/27/2016 03:44 PM Subject: gpfsug-discuss Digest, Vol 54, Issue 63 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: CCR troubles (Bryan Banister) ---------------------------------------------------------------------- Message: 1 Date: Wed, 27 Jul 2016 22:44:27 +0000 From: Bryan Banister To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D at CHI-EXCHANGEW1.w2k.jumptrading.com> Content-Type: text/plain; charset="utf-8" Right, I know that I can disable CCR, and I?m asking if this seemingly broken behavior of GPFS commands when the cluster is down was the expected mode of operation with CCR enabled. Sounds like it from the responses thus far. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of McPheeters, Gordon Sent: Wednesday, July 27, 2016 5:35 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles mmchcluster has an option: ??ccr?disable Reverts to the traditional primary or backup configuration server semantics and destroys the CCR environment. All nodes must be shut down before disabling CCR. -Gordon On Jul 27, 2016, at 5:29 PM, Bryan Banister > wrote: Hi Marc, I do understand the principal you describe. The quorum nodes are accessible over TCP/IP but GPFS happens to be down. I think that CCR should would work regardless of whether GPFS is up or down, so that you can change the configuration on a down cluster. I could even imagine a scenario where a config parameter was set incorrectly and prevents GPFS from starting at all. If you have to have GPFS up to make config changes because of CCR then how can you fix this issue? Thanks for the response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Wednesday, July 27, 2016 1:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister > To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org< mailto:gpfsug-discuss at spectrumscale.org>)" > Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> ________________________________ When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com< http://fpia-gpfs-jcsdr01.grid.jumptrading.com/>. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160727/ea365c46/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 54, Issue 63 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From olaf.weiser at de.ibm.com Thu Jul 28 05:23:34 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 28 Jul 2016 06:23:34 +0200 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: An HTML attachment was scrubbed... URL: From radhika.p at in.ibm.com Thu Jul 28 06:43:13 2016 From: radhika.p at in.ibm.com (Radhika A Parameswaran) Date: Thu, 28 Jul 2016 11:13:13 +0530 Subject: [gpfsug-discuss] Re. AFM Crashing the MDS In-Reply-To: References: Message-ID: Luke, AFM is not tested for cascading configurations, this is getting added into the documentation for 4.2.1: "Cascading of AFM caches is not tested." Thanks and Regards Radhika From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 07/27/2016 04:30 PM Subject: gpfsug-discuss Digest, Vol 54, Issue 59 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. AFM Crashing the MDS (Luke Raimbach) ---------------------------------------------------------------------- Message: 1 Date: Tue, 26 Jul 2016 14:17:35 +0000 From: Luke Raimbach To: gpfsug main discussion list Subject: [gpfsug-discuss] AFM Crashing the MDS Message-ID: Content-Type: text/plain; charset="utf-8" Hi All, Anyone seen GPFS barf like this before? I'll explain the setup: RO AFM cache on remote site (cache A) for reading remote datasets quickly, LU AFM cache at destination site (cache B) for caching data from cache A (has a local compute cluster mounting this over multi-cluster), IW AFM cache at destination site (cache C) for presenting cache B over NAS protocols, Reading files in cache C should pull data from the remote source through cache A->B->C Modifying files in cache C should pull data into cache B and then break the cache relationship for that file, converting it to a local copy. Those modifications should include metadata updates (e.g. chown). To speed things up we prefetch files into cache B for datasets which are undergoing migration and have entered a read-only state at the source. When issuing chown on a directory in cache C containing ~4.5million files, the MDS for the AFM cache C crashes badly: Tue Jul 26 13:28:52.487 2016: [X] logAssertFailed: addr.isReserved() || addr.getClusterIdx() == clusterIdx Tue Jul 26 13:28:52.488 2016: [X] return code 0, reason code 1, log record tag 0 Tue Jul 26 13:28:53.392 2016: [X] *** Assert exp(addr.isReserved() || addr.getClusterIdx() == clusterIdx) in line 1936 of file /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h Tue Jul 26 13:28:53.393 2016: [E] *** Traceback: Tue Jul 26 13:28:53.394 2016: [E] 2:0x7F6DC95444A6 logAssertFailed + 0x2D6 at ??:0 Tue Jul 26 13:28:53.395 2016: [E] 3:0x7F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 0x4B4 at ??:0 Tue Jul 26 13:28:53.396 2016: [E] 4:0x7F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 0x91 at ??:0 Tue Jul 26 13:28:53.397 2016: [E] 5:0x7F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 0x346 at ??:0 Tue Jul 26 13:28:53.398 2016: [E] 6:0x7F6DC9332494 HandleMBPcache(MBPcacheParms*) + 0xB4 at ??:0 Tue Jul 26 13:28:53.399 2016: [E] 7:0x7F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 0x3C3 at ??:0 Tue Jul 26 13:28:53.400 2016: [E] 8:0x7F6DC908BC06 Thread::callBody(Thread*) + 0x46 at ??:0 Tue Jul 26 13:28:53.401 2016: [E] 9:0x7F6DC907A0D2 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 Tue Jul 26 13:28:53.402 2016: [E] 10:0x7F6DC87A3AA1 start_thread + 0xD1 at ??:0 Tue Jul 26 13:28:53.403 2016: [E] 11:0x7F6DC794A93D clone + 0x6D at ??:0 mmfsd: /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h:1936: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion `addr.isReserved() || addr.getClusterIdx() == clusterIdx' failed. Tue Jul 26 13:28:53.404 2016: [N] Signal 6 at location 0x7F6DC7894625 in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:53.405 2016: [I] rax 0x0000000000000000 rbx 0x00007F6DC8DCB000 Tue Jul 26 13:28:53.406 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx 0x0000000000000006 Tue Jul 26 13:28:53.407 2016: [I] rsp 0x00007F6DAAEA01F8 rbp 0x00007F6DCA05C8B0 Tue Jul 26 13:28:53.408 2016: [I] rsi 0x00000000000018F8 rdi 0x0000000000001876 Tue Jul 26 13:28:53.409 2016: [I] r8 0xFEFEFEFEFEFEFEFF r9 0xFEFEFEFEFF092D63 Tue Jul 26 13:28:53.410 2016: [I] r10 0x0000000000000008 r11 0x0000000000000202 Tue Jul 26 13:28:53.411 2016: [I] r12 0x00007F6DC9FC5540 r13 0x00007F6DCA05C1C0 Tue Jul 26 13:28:53.412 2016: [I] r14 0x0000000000000000 r15 0x0000000000000000 Tue Jul 26 13:28:53.413 2016: [I] rip 0x00007F6DC7894625 eflags 0x0000000000000202 Tue Jul 26 13:28:53.414 2016: [I] csgsfs 0x0000000000000033 err 0x0000000000000000 Tue Jul 26 13:28:53.415 2016: [I] trapno 0x0000000000000000 oldmsk 0x0000000010017807 Tue Jul 26 13:28:53.416 2016: [I] cr2 0x0000000000000000 Tue Jul 26 13:28:54.225 2016: [D] Traceback: Tue Jul 26 13:28:54.226 2016: [D] 0:00007F6DC7894625 raise + 35 at ??:0 Tue Jul 26 13:28:54.227 2016: [D] 1:00007F6DC7895E05 abort + 175 at ??:0 Tue Jul 26 13:28:54.228 2016: [D] 2:00007F6DC788D74E __assert_fail_base + 11E at ??:0 Tue Jul 26 13:28:54.229 2016: [D] 3:00007F6DC788D810 __assert_fail + 50 at ??:0 Tue Jul 26 13:28:54.230 2016: [D] 4:00007F6DC95444CA logAssertFailed + 2FA at ??:0 Tue Jul 26 13:28:54.231 2016: [D] 5:00007F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 4B4 at ??:0 Tue Jul 26 13:28:54.232 2016: [D] 6:00007F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 91 at ??:0 Tue Jul 26 13:28:54.233 2016: [D] 7:00007F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 346 at ??:0 Tue Jul 26 13:28:54.234 2016: [D] 8:00007F6DC9332494 HandleMBPcache(MBPcacheParms*) + B4 at ??:0 Tue Jul 26 13:28:54.235 2016: [D] 9:00007F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 3C3 at ??:0 Tue Jul 26 13:28:54.236 2016: [D] 10:00007F6DC908BC06 Thread::callBody(Thread*) + 46 at ??:0 Tue Jul 26 13:28:54.237 2016: [D] 11:00007F6DC907A0D2 Thread::callBodyWrapper(Thread*) + A2 at ??:0 Tue Jul 26 13:28:54.238 2016: [D] 12:00007F6DC87A3AA1 start_thread + D1 at ??:0 Tue Jul 26 13:28:54.239 2016: [D] 13:00007F6DC794A93D clone + 6D at ??:0 Tue Jul 26 13:28:54.240 2016: [N] Restarting mmsdrserv Tue Jul 26 13:28:55.535 2016: [N] Signal 6 at location 0x7F6DC790EA7D in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:55.536 2016: [N] mmfsd is shutting down. Tue Jul 26 13:28:55.537 2016: [N] Reason for shutdown: Signal handler entered Tue Jul 26 13:28:55 BST 2016: mmcommon mmfsdown invoked. Subsystem: mmfs Status: active Tue Jul 26 13:28:55 BST 2016: /var/mmfs/etc/mmfsdown invoked umount2: Device or resource busy umount: /camp: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy umount: /ingest: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Shutting down NFS daemon: [ OK ] Shutting down NFS mountd: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Shutting down RPC idmapd: [ OK ] Stopping NFS statd: [ OK ] Ugly, right? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 54, Issue 59 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Thu Jul 28 09:30:59 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Thu, 28 Jul 2016 08:30:59 +0000 Subject: [gpfsug-discuss] AFM Crashing the MDS Message-ID: Dear Radhika, In the early days of AFM and at two separate GPFS UK User Group meetings, I discussed AFM cache chaining with IBM technical people plus at least one developer. My distinct recollection of the outcome was that cache chaining was supported. Nevertheless, the difference between what my memory tells me and what is being reported now is irrelevant. We are stuck with large volumes of data being migrated in this fashion, so there is clearly a customer use case for chaining AFM caches. It would be much more helpful if IBM could take on this case and look at the suspected bug that's been chased out here. Real world observation in the field is that queuing large numbers of metadata updates on the MDS itself causes this crash, whereas issuing the updates from another node in the cache cluster adds to the MDS queue and the crash does not happen. My guess is that there is a bug whereby daemon-local additions to the MDS queue aren't handled correctly (further speculation is that there is a memory leak for local MDS operations, but that needs more testing which I don't have time for - perhaps IBM could try it out?); however, when a metadata update operation is sent through an RPC from another node, it is added to the queue and handled correctly. A workaround, if you will. Other minor observations here are that the further down the chain of caches you are, the larger you should set afmDisconnectTimeout as any intermediate cache recovery time needs to be taken into account following a disconnect event. Initially, this was slightly counterintuitive because caches B and C as described below are connected over multiple IB interfaces and shouldn't disconnect except when there's some other failure. Conversely, the connection between cache A and B is over a very flaky wide area network and although we've managed to tune out a lot of the problems introduced by high and variable latency, the view of cache A from cache B's perspective still sometimes gets suspended. The failure observed above doesn't really feel like it's an artefact of cascading caches, but a bug in MDS code as described. Sharing background information about the cascading cache setup was in the spirit of the mailing list and might have led IBM or other customers attempting this kind of setup to share some of their experiences. Hope you can help. Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Radhika A Parameswaran Sent: 28 July 2016 06:43 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Re. AFM Crashing the MDS Luke, AFM is not tested for cascading configurations, this is getting added into the documentation for 4.2.1: "Cascading of AFM caches is not tested." Thanks and Regards Radhika From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 07/27/2016 04:30 PM Subject: gpfsug-discuss Digest, Vol 54, Issue 59 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. AFM Crashing the MDS (Luke Raimbach) ---------------------------------------------------------------------- Message: 1 Date: Tue, 26 Jul 2016 14:17:35 +0000 From: Luke Raimbach > To: gpfsug main discussion list > Subject: [gpfsug-discuss] AFM Crashing the MDS Message-ID: > Content-Type: text/plain; charset="utf-8" Hi All, Anyone seen GPFS barf like this before? I'll explain the setup: RO AFM cache on remote site (cache A) for reading remote datasets quickly, LU AFM cache at destination site (cache B) for caching data from cache A (has a local compute cluster mounting this over multi-cluster), IW AFM cache at destination site (cache C) for presenting cache B over NAS protocols, Reading files in cache C should pull data from the remote source through cache A->B->C Modifying files in cache C should pull data into cache B and then break the cache relationship for that file, converting it to a local copy. Those modifications should include metadata updates (e.g. chown). To speed things up we prefetch files into cache B for datasets which are undergoing migration and have entered a read-only state at the source. When issuing chown on a directory in cache C containing ~4.5million files, the MDS for the AFM cache C crashes badly: Tue Jul 26 13:28:52.487 2016: [X] logAssertFailed: addr.isReserved() || addr.getClusterIdx() == clusterIdx Tue Jul 26 13:28:52.488 2016: [X] return code 0, reason code 1, log record tag 0 Tue Jul 26 13:28:53.392 2016: [X] *** Assert exp(addr.isReserved() || addr.getClusterIdx() == clusterIdx) in line 1936 of file /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h Tue Jul 26 13:28:53.393 2016: [E] *** Traceback: Tue Jul 26 13:28:53.394 2016: [E] 2:0x7F6DC95444A6 logAssertFailed + 0x2D6 at ??:0 Tue Jul 26 13:28:53.395 2016: [E] 3:0x7F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 0x4B4 at ??:0 Tue Jul 26 13:28:53.396 2016: [E] 4:0x7F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 0x91 at ??:0 Tue Jul 26 13:28:53.397 2016: [E] 5:0x7F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 0x346 at ??:0 Tue Jul 26 13:28:53.398 2016: [E] 6:0x7F6DC9332494 HandleMBPcache(MBPcacheParms*) + 0xB4 at ??:0 Tue Jul 26 13:28:53.399 2016: [E] 7:0x7F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 0x3C3 at ??:0 Tue Jul 26 13:28:53.400 2016: [E] 8:0x7F6DC908BC06 Thread::callBody(Thread*) + 0x46 at ??:0 Tue Jul 26 13:28:53.401 2016: [E] 9:0x7F6DC907A0D2 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 Tue Jul 26 13:28:53.402 2016: [E] 10:0x7F6DC87A3AA1 start_thread + 0xD1 at ??:0 Tue Jul 26 13:28:53.403 2016: [E] 11:0x7F6DC794A93D clone + 0x6D at ??:0 mmfsd: /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h:1936: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion `addr.isReserved() || addr.getClusterIdx() == clusterIdx' failed. Tue Jul 26 13:28:53.404 2016: [N] Signal 6 at location 0x7F6DC7894625 in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:53.405 2016: [I] rax 0x0000000000000000 rbx 0x00007F6DC8DCB000 Tue Jul 26 13:28:53.406 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx 0x0000000000000006 Tue Jul 26 13:28:53.407 2016: [I] rsp 0x00007F6DAAEA01F8 rbp 0x00007F6DCA05C8B0 Tue Jul 26 13:28:53.408 2016: [I] rsi 0x00000000000018F8 rdi 0x0000000000001876 Tue Jul 26 13:28:53.409 2016: [I] r8 0xFEFEFEFEFEFEFEFF r9 0xFEFEFEFEFF092D63 Tue Jul 26 13:28:53.410 2016: [I] r10 0x0000000000000008 r11 0x0000000000000202 Tue Jul 26 13:28:53.411 2016: [I] r12 0x00007F6DC9FC5540 r13 0x00007F6DCA05C1C0 Tue Jul 26 13:28:53.412 2016: [I] r14 0x0000000000000000 r15 0x0000000000000000 Tue Jul 26 13:28:53.413 2016: [I] rip 0x00007F6DC7894625 eflags 0x0000000000000202 Tue Jul 26 13:28:53.414 2016: [I] csgsfs 0x0000000000000033 err 0x0000000000000000 Tue Jul 26 13:28:53.415 2016: [I] trapno 0x0000000000000000 oldmsk 0x0000000010017807 Tue Jul 26 13:28:53.416 2016: [I] cr2 0x0000000000000000 Tue Jul 26 13:28:54.225 2016: [D] Traceback: Tue Jul 26 13:28:54.226 2016: [D] 0:00007F6DC7894625 raise + 35 at ??:0 Tue Jul 26 13:28:54.227 2016: [D] 1:00007F6DC7895E05 abort + 175 at ??:0 Tue Jul 26 13:28:54.228 2016: [D] 2:00007F6DC788D74E __assert_fail_base + 11E at ??:0 Tue Jul 26 13:28:54.229 2016: [D] 3:00007F6DC788D810 __assert_fail + 50 at ??:0 Tue Jul 26 13:28:54.230 2016: [D] 4:00007F6DC95444CA logAssertFailed + 2FA at ??:0 Tue Jul 26 13:28:54.231 2016: [D] 5:00007F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 4B4 at ??:0 Tue Jul 26 13:28:54.232 2016: [D] 6:00007F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 91 at ??:0 Tue Jul 26 13:28:54.233 2016: [D] 7:00007F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 346 at ??:0 Tue Jul 26 13:28:54.234 2016: [D] 8:00007F6DC9332494 HandleMBPcache(MBPcacheParms*) + B4 at ??:0 Tue Jul 26 13:28:54.235 2016: [D] 9:00007F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 3C3 at ??:0 Tue Jul 26 13:28:54.236 2016: [D] 10:00007F6DC908BC06 Thread::callBody(Thread*) + 46 at ??:0 Tue Jul 26 13:28:54.237 2016: [D] 11:00007F6DC907A0D2 Thread::callBodyWrapper(Thread*) + A2 at ??:0 Tue Jul 26 13:28:54.238 2016: [D] 12:00007F6DC87A3AA1 start_thread + D1 at ??:0 Tue Jul 26 13:28:54.239 2016: [D] 13:00007F6DC794A93D clone + 6D at ??:0 Tue Jul 26 13:28:54.240 2016: [N] Restarting mmsdrserv Tue Jul 26 13:28:55.535 2016: [N] Signal 6 at location 0x7F6DC790EA7D in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:55.536 2016: [N] mmfsd is shutting down. Tue Jul 26 13:28:55.537 2016: [N] Reason for shutdown: Signal handler entered Tue Jul 26 13:28:55 BST 2016: mmcommon mmfsdown invoked. Subsystem: mmfs Status: active Tue Jul 26 13:28:55 BST 2016: /var/mmfs/etc/mmfsdown invoked umount2: Device or resource busy umount: /camp: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy umount: /ingest: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Shutting down NFS daemon: [ OK ] Shutting down NFS mountd: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Shutting down RPC idmapd: [ OK ] Stopping NFS statd: [ OK ] Ugly, right? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 54, Issue 59 ********************************************** The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From radhika.p at in.ibm.com Thu Jul 28 16:04:44 2016 From: radhika.p at in.ibm.com (Radhika A Parameswaran) Date: Thu, 28 Jul 2016 20:34:44 +0530 Subject: [gpfsug-discuss] AFM Crashing the MDS In-Reply-To: References: Message-ID: Hi Luke, We are explicitly adding cascading to the 4.2.1 documentation as not tested, as we saw few issues during our in-house testing and the tests are not complete. With specific to this use case, we can give it a try and get back to your personal id. Thanks and Regards Radhika -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Thu Jul 28 16:39:09 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Thu, 28 Jul 2016 11:39:09 -0400 Subject: [gpfsug-discuss] GPFS on Broadwell processor Message-ID: All, Is there anything special (BIOS option / kernel option) that needs to be done when running GPFS on a Broadwell powered NSD server? Thank you, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamiedavis at us.ibm.com Thu Jul 28 16:48:06 2016 From: jamiedavis at us.ibm.com (James Davis) Date: Thu, 28 Jul 2016 15:48:06 +0000 Subject: [gpfsug-discuss] GPFS on Broadwell processor In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 28 18:24:52 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 28 Jul 2016 13:24:52 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: Based on experiments on my test cluster, I can assure you that you can list and change GPFS configuration parameters with CCR enabled while GPFS is down. I understand you are having a problem with your cluster, but you are incorrectly disparaging the CCR. In fact you can mmshutdown -a AND kill all GPFS related processes, including mmsdrserv and mmcrmonitor and then issue commands like: mmlscluster, mmlsconfig, mmchconfig Those will work correctly and by-the-way re-start mmsdrserv and mmcrmonitor... (Use command like `ps auxw | grep mm` to find the relevenat processes). But that will not start the main GPFS file manager process mmfsd. GPFS "proper" remains down... For the following commands Linux was "up" on all nodes, but GPFS was shutdown. [root at n2 gpfs-git]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 down 4 n5 down 6 n3 down However if a majority of the quorum nodes can not be obtained, you WILL see a sequence of messages like this, after a noticeable "timeout": (For the following test I had three quorum nodes and did a Linux shutdown on two of them...) [root at n2 gpfs-git]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. mmchconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmgetstate -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it should check! Then re-starting Linux... So I have two of three quorum nodes active, but GPFS still down... ## From n2, login to node n3 that I just rebooted... [root at n2 gpfs-git]# ssh n3 Last login: Thu Jul 28 09:50:53 2016 from n2.frozen ## See if any mm processes are running? ... NOPE! [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep --color=auto mm ## Check the state... notice n4 is powered off... [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Examine the cluster configuration [root at n3 ~]# mmlscluster mmlscluster GPFS cluster information ======================== GPFS cluster name: madagascar.frozen GPFS cluster id: 7399668614468035547 GPFS UID domain: madagascar.frozen Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR GPFS cluster configuration servers: ----------------------------------- Primary server: n2.frozen (not in use) Secondary server: n4.frozen (not in use) Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------- 1 n2.frozen 172.20.0.21 n2.frozen quorum-manager-perfmon 3 n4.frozen 172.20.0.23 n4.frozen quorum-manager-perfmon 4 n5.frozen 172.20.0.24 n5.frozen perfmon 6 n3.frozen 172.20.0.22 n3.frozen quorum-manager-perfmon ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep --color=auto mm ## Now I can mmchconfig ... while GPFS remains down. [root at n3 ~]# mmchconfig worker1Threads=1022 mmchconfig worker1Threads=1022 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation started Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation completed; mmdsh rc=0 [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. [root at n3 ~]# ping -c 1 n4 ping -c 1 n4 PING n4.frozen (172.20.0.23) 56(84) bytes of data. >From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable --- n4.frozen ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms [root at n3 ~]# exit exit logout Connection to n3 closed. [root at n2 gpfs-git]# ps auwx | grep mm root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep --color=auto mm root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py [root at n2 gpfs-git]# -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jul 28 18:57:53 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 28 Jul 2016 17:57:53 +0000 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> I now see that these mmccrmonitor and mmsdrserv daemons are required for the CCR operations to work. This is just not clear in the error output. Even the GPFS 4.2 Problem Determination Guide doesn't have anything explaining the "Not enough CCR quorum nodes available" or "Unexpected error from ccr fget mmsdrfs" error messages. Thus there is no clear direction on how to fix this issue from the command output, the man pages, nor the Admin Guides. [root at fpia-gpfs-jcsdr01 ~]# man -E ascii mmccr No manual entry for mmccr There isn't a help for mmccr either, but at least it does print some usage info: [root at fpia-gpfs-jcsdr01 ~]# mmccr -h Unknown subcommand: '-h'Usage: mmccr subcommand common-options subcommand-options... Subcommands: Setup and Initialization: [snip] I'm still not sure how to start these mmccrmonitor and mmsdrserv daemons without starting GPFS... could you tell me how it would be possible? Thanks for sharing details about how this all works Marc, I do appreciate your response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, July 28, 2016 12:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown Based on experiments on my test cluster, I can assure you that you can list and change GPFS configuration parameters with CCR enabled while GPFS is down. I understand you are having a problem with your cluster, but you are incorrectly disparaging the CCR. In fact you can mmshutdown -a AND kill all GPFS related processes, including mmsdrserv and mmcrmonitor and then issue commands like: mmlscluster, mmlsconfig, mmchconfig Those will work correctly and by-the-way re-start mmsdrserv and mmcrmonitor... (Use command like `ps auxw | grep mm` to find the relevenat processes). But that will not start the main GPFS file manager process mmfsd. GPFS "proper" remains down... For the following commands Linux was "up" on all nodes, but GPFS was shutdown. [root at n2 gpfs-git]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 down 4 n5 down 6 n3 down However if a majority of the quorum nodes can not be obtained, you WILL see a sequence of messages like this, after a noticeable "timeout": (For the following test I had three quorum nodes and did a Linux shutdown on two of them...) [root at n2 gpfs-git]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. mmchconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmgetstate -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it should check! Then re-starting Linux... So I have two of three quorum nodes active, but GPFS still down... ## From n2, login to node n3 that I just rebooted... [root at n2 gpfs-git]# ssh n3 Last login: Thu Jul 28 09:50:53 2016 from n2.frozen ## See if any mm processes are running? ... NOPE! [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep --color=auto mm ## Check the state... notice n4 is powered off... [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Examine the cluster configuration [root at n3 ~]# mmlscluster mmlscluster GPFS cluster information ======================== GPFS cluster name: madagascar.frozen GPFS cluster id: 7399668614468035547 GPFS UID domain: madagascar.frozen Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR GPFS cluster configuration servers: ----------------------------------- Primary server: n2.frozen (not in use) Secondary server: n4.frozen (not in use) Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------- 1 n2.frozen 172.20.0.21 n2.frozen quorum-manager-perfmon 3 n4.frozen 172.20.0.23 n4.frozen quorum-manager-perfmon 4 n5.frozen 172.20.0.24 n5.frozen perfmon 6 n3.frozen 172.20.0.22 n3.frozen quorum-manager-perfmon ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep --color=auto mm ## Now I can mmchconfig ... while GPFS remains down. [root at n3 ~]# mmchconfig worker1Threads=1022 mmchconfig worker1Threads=1022 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation started Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation completed; mmdsh rc=0 [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. [root at n3 ~]# ping -c 1 n4 ping -c 1 n4 PING n4.frozen (172.20.0.23) 56(84) bytes of data. >From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable --- n4.frozen ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms [root at n3 ~]# exit exit logout Connection to n3 closed. [root at n2 gpfs-git]# ps auwx | grep mm root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep --color=auto mm root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py [root at n2 gpfs-git]# ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jul 28 19:14:05 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 28 Jul 2016 18:14:05 +0000 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> Hi Marc, So this issue is actually caused by our Systemd setup. We have fully converted over to Systemd to manage the dependency chain needed for GPFS to start properly and also our scheduling system after that. The issue is that when we shutdown GPFS with Systemd this apparently is causing the mmsdrserv and mmccrmonitor processes to also be killed/term'd, probably because these are started in the same CGROUP as GPFS and Systemd kills all processes in this CGROUP when GPFS is stopped. Not sure how to proceed with safeguarding these daemons from Systemd... and real Systemd support in GPFS is basically non-existent at this point. So my problem is actually a Systemd problem, not a CCR problem! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan Banister Sent: Thursday, July 28, 2016 12:58 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown I now see that these mmccrmonitor and mmsdrserv daemons are required for the CCR operations to work. This is just not clear in the error output. Even the GPFS 4.2 Problem Determination Guide doesn't have anything explaining the "Not enough CCR quorum nodes available" or "Unexpected error from ccr fget mmsdrfs" error messages. Thus there is no clear direction on how to fix this issue from the command output, the man pages, nor the Admin Guides. [root at fpia-gpfs-jcsdr01 ~]# man -E ascii mmccr No manual entry for mmccr There isn't a help for mmccr either, but at least it does print some usage info: [root at fpia-gpfs-jcsdr01 ~]# mmccr -h Unknown subcommand: '-h'Usage: mmccr subcommand common-options subcommand-options... Subcommands: Setup and Initialization: [snip] I'm still not sure how to start these mmccrmonitor and mmsdrserv daemons without starting GPFS... could you tell me how it would be possible? Thanks for sharing details about how this all works Marc, I do appreciate your response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, July 28, 2016 12:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown Based on experiments on my test cluster, I can assure you that you can list and change GPFS configuration parameters with CCR enabled while GPFS is down. I understand you are having a problem with your cluster, but you are incorrectly disparaging the CCR. In fact you can mmshutdown -a AND kill all GPFS related processes, including mmsdrserv and mmcrmonitor and then issue commands like: mmlscluster, mmlsconfig, mmchconfig Those will work correctly and by-the-way re-start mmsdrserv and mmcrmonitor... (Use command like `ps auxw | grep mm` to find the relevenat processes). But that will not start the main GPFS file manager process mmfsd. GPFS "proper" remains down... For the following commands Linux was "up" on all nodes, but GPFS was shutdown. [root at n2 gpfs-git]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 down 4 n5 down 6 n3 down However if a majority of the quorum nodes can not be obtained, you WILL see a sequence of messages like this, after a noticeable "timeout": (For the following test I had three quorum nodes and did a Linux shutdown on two of them...) [root at n2 gpfs-git]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. mmchconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmgetstate -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it should check! Then re-starting Linux... So I have two of three quorum nodes active, but GPFS still down... ## From n2, login to node n3 that I just rebooted... [root at n2 gpfs-git]# ssh n3 Last login: Thu Jul 28 09:50:53 2016 from n2.frozen ## See if any mm processes are running? ... NOPE! [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep --color=auto mm ## Check the state... notice n4 is powered off... [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Examine the cluster configuration [root at n3 ~]# mmlscluster mmlscluster GPFS cluster information ======================== GPFS cluster name: madagascar.frozen GPFS cluster id: 7399668614468035547 GPFS UID domain: madagascar.frozen Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR GPFS cluster configuration servers: ----------------------------------- Primary server: n2.frozen (not in use) Secondary server: n4.frozen (not in use) Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------- 1 n2.frozen 172.20.0.21 n2.frozen quorum-manager-perfmon 3 n4.frozen 172.20.0.23 n4.frozen quorum-manager-perfmon 4 n5.frozen 172.20.0.24 n5.frozen perfmon 6 n3.frozen 172.20.0.22 n3.frozen quorum-manager-perfmon ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep --color=auto mm ## Now I can mmchconfig ... while GPFS remains down. [root at n3 ~]# mmchconfig worker1Threads=1022 mmchconfig worker1Threads=1022 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation started Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation completed; mmdsh rc=0 [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. [root at n3 ~]# ping -c 1 n4 ping -c 1 n4 PING n4.frozen (172.20.0.23) 56(84) bytes of data. >From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable --- n4.frozen ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms [root at n3 ~]# exit exit logout Connection to n3 closed. [root at n2 gpfs-git]# ps auwx | grep mm root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep --color=auto mm root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py [root at n2 gpfs-git]# ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 28 19:23:49 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 28 Jul 2016 14:23:49 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: I think the idea is that you should not need to know the details of how ccr and sdrserv are implemented nor how they work. At this moment, I don't! Literally, I just installed GPFS and defined my system with mmcrcluster and so forth and "it just works". As I wrote, just running mmlscluster or mmlsconfig or similar configuration create, list, change, delete commands should start up ccr and sdrserv under the covers. Okay, now "I hear you" -- it ain't working for you today. Presumably it did a while ago? Let's think about that... Troubleshooting 0,1,2 in order of suspicion... 0. Check that you can ping and ssh from each quorum node to every other quorum node. Q*(Q-1) tests 1. Check that you have plenty of free space in /var on each quorum node. Hmmm... we're not talking huge, but see if /var/mmfs/tmp is filled with junk.... Before and After clearing most of that out I had and have: [root at bog-wifi ~]# du -shk /var/mmfs 84532 /var/mmfs ## clean all big and old files out of /var/mmfs/tmp [root at bog-wifi ~]# du -shk /var/mmfs 9004 /var/mmfs Because we know that /var/mmfs is where GPFS store configuration "stuff" - 2. Check that we have GPFS software correctly installed on each quorum node: rpm -qa gpfs.* | xargs rpm --verify From: Bryan Banister To: gpfsug main discussion list Date: 07/28/2016 01:58 PM Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown Sent by: gpfsug-discuss-bounces at spectrumscale.org I now see that these mmccrmonitor and mmsdrserv daemons are required for the CCR operations to work. This is just not clear in the error output. Even the GPFS 4.2 Problem Determination Guide doesn?t have anything explaining the ?Not enough CCR quorum nodes available? or ?Unexpected error from ccr fget mmsdrfs? error messages. Thus there is no clear direction on how to fix this issue from the command output, the man pages, nor the Admin Guides. [root at fpia-gpfs-jcsdr01 ~]# man -E ascii mmccr No manual entry for mmccr There isn?t a help for mmccr either, but at least it does print some usage info: [root at fpia-gpfs-jcsdr01 ~]# mmccr -h Unknown subcommand: '-h'Usage: mmccr subcommand common-options subcommand-options... Subcommands: Setup and Initialization: [snip] I?m still not sure how to start these mmccrmonitor and mmsdrserv daemons without starting GPFS? could you tell me how it would be possible? Thanks for sharing details about how this all works Marc, I do appreciate your response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, July 28, 2016 12:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown Based on experiments on my test cluster, I can assure you that you can list and change GPFS configuration parameters with CCR enabled while GPFS is down. I understand you are having a problem with your cluster, but you are incorrectly disparaging the CCR. In fact you can mmshutdown -a AND kill all GPFS related processes, including mmsdrserv and mmcrmonitor and then issue commands like: mmlscluster, mmlsconfig, mmchconfig Those will work correctly and by-the-way re-start mmsdrserv and mmcrmonitor... (Use command like `ps auxw | grep mm` to find the relevenat processes). But that will not start the main GPFS file manager process mmfsd. GPFS "proper" remains down... For the following commands Linux was "up" on all nodes, but GPFS was shutdown. [root at n2 gpfs-git]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 down 4 n5 down 6 n3 down However if a majority of the quorum nodes can not be obtained, you WILL see a sequence of messages like this, after a noticeable "timeout": (For the following test I had three quorum nodes and did a Linux shutdown on two of them...) [root at n2 gpfs-git]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. mmchconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmgetstate -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it should check! Then re-starting Linux... So I have two of three quorum nodes active, but GPFS still down... ## From n2, login to node n3 that I just rebooted... [root at n2 gpfs-git]# ssh n3 Last login: Thu Jul 28 09:50:53 2016 from n2.frozen ## See if any mm processes are running? ... NOPE! [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep --color=auto mm ## Check the state... notice n4 is powered off... [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Examine the cluster configuration [root at n3 ~]# mmlscluster mmlscluster GPFS cluster information ======================== GPFS cluster name: madagascar.frozen GPFS cluster id: 7399668614468035547 GPFS UID domain: madagascar.frozen Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR GPFS cluster configuration servers: ----------------------------------- Primary server: n2.frozen (not in use) Secondary server: n4.frozen (not in use) Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------- 1 n2.frozen 172.20.0.21 n2.frozen quorum-manager-perfmon 3 n4.frozen 172.20.0.23 n4.frozen quorum-manager-perfmon 4 n5.frozen 172.20.0.24 n5.frozen perfmon 6 n3.frozen 172.20.0.22 n3.frozen quorum-manager-perfmon ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep --color=auto mm ## Now I can mmchconfig ... while GPFS remains down. [root at n3 ~]# mmchconfig worker1Threads=1022 mmchconfig worker1Threads=1022 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation started Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation completed; mmdsh rc=0 [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. [root at n3 ~]# ping -c 1 n4 ping -c 1 n4 PING n4.frozen (172.20.0.23) 56(84) bytes of data. From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable --- n4.frozen ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms [root at n3 ~]# exit exit logout Connection to n3 closed. [root at n2 gpfs-git]# ps auwx | grep mm root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep --color=auto mm root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py [root at n2 gpfs-git]# Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From oehmes at gmail.com Thu Jul 28 19:27:20 2016 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 28 Jul 2016 11:27:20 -0700 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: they should get started as soon as you shutdown via mmshutdown could you check a node where the processes are NOT started and simply run mmshutdown on this node to see if they get started ? On Thu, Jul 28, 2016 at 10:57 AM, Bryan Banister wrote: > I now see that these mmccrmonitor and mmsdrserv daemons are required for > the CCR operations to work. This is just not clear in the error output. > Even the GPFS 4.2 Problem Determination Guide doesn?t have anything > explaining the ?Not enough CCR quorum nodes available? or ?Unexpected error > from ccr fget mmsdrfs? error messages. Thus there is no clear direction on > how to fix this issue from the command output, the man pages, nor the Admin > Guides. > > > > [root at fpia-gpfs-jcsdr01 ~]# man -E ascii mmccr > > No manual entry for mmccr > > > > There isn?t a help for mmccr either, but at least it does print some usage > info: > > > > [root at fpia-gpfs-jcsdr01 ~]# mmccr -h > > Unknown subcommand: '-h'Usage: mmccr subcommand common-options > subcommand-options... > > > > Subcommands: > > > > Setup and Initialization: > > [snip] > > > > I?m still not sure how to start these mmccrmonitor and mmsdrserv daemons > without starting GPFS? could you tell me how it would be possible? > > > > Thanks for sharing details about how this all works Marc, I do appreciate > your response! > > -Bryan > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *On Behalf Of *Marc A Kaplan > *Sent:* Thursday, July 28, 2016 12:25 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig > commands fine with mmshutdown > > > > Based on experiments on my test cluster, I can assure you that you can > list and change GPFS configuration parameters with CCR enabled while GPFS > is down. > > I understand you are having a problem with your cluster, but you are > incorrectly disparaging the CCR. > > In fact you can mmshutdown -a AND kill all GPFS related processes, > including mmsdrserv and mmcrmonitor and then issue commands like: > > mmlscluster, mmlsconfig, mmchconfig > > Those will work correctly and by-the-way re-start mmsdrserv and > mmcrmonitor... > (Use command like `ps auxw | grep mm` to find the relevenat processes). > > But that will not start the main GPFS file manager process mmfsd. GPFS > "proper" remains down... > > For the following commands Linux was "up" on all nodes, but GPFS was > shutdown. > [root at n2 gpfs-git]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 n2 down > 3 n4 down > 4 n5 down > 6 n3 down > > However if a majority of the quorum nodes can not be obtained, you WILL > see a sequence of messages like this, after a noticeable "timeout": > (For the following test I had three quorum nodes and did a Linux shutdown > on two of them...) > > [root at n2 gpfs-git]# mmlsconfig > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmlsconfig: Command failed. Examine previous error messages to determine > cause. > > [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 > mmchconfig: Unable to obtain the GPFS configuration file lock. > mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. > mmchconfig: Command failed. Examine previous error messages to determine > cause. > > [root at n2 gpfs-git]# mmgetstate -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmgetstate: Command failed. Examine previous error messages to determine > cause. > > HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it > should check! > > Then re-starting Linux... So I have two of three quorum nodes active, but > GPFS still down... > > ## From n2, login to node n3 that I just rebooted... > [root at n2 gpfs-git]# ssh n3 > Last login: Thu Jul 28 09:50:53 2016 from n2.frozen > > ## See if any mm processes are running? ... NOPE! > > [root at n3 ~]# ps auxw | grep mm > ps auxw | grep mm > root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep > --color=auto mm > > ## Check the state... notice n4 is powered off... > [root at n3 ~]# mmgetstate -a > mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 n2 down > 3 n4 unknown > 4 n5 down > 6 n3 down > > ## Examine the cluster configuration > [root at n3 ~]# mmlscluster > mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: madagascar.frozen > GPFS cluster id: 7399668614468035547 > GPFS UID domain: madagascar.frozen > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: n2.frozen (not in use) > Secondary server: n4.frozen (not in use) > > Node Daemon node name IP address Admin node name Designation > ------------------------------------------------------------------- > 1 n2.frozen 172.20.0.21 n2.frozen > quorum-manager-perfmon > 3 n4.frozen 172.20.0.23 n4.frozen > quorum-manager-perfmon > 4 n5.frozen 172.20.0.24 n5.frozen perfmon > 6 n3.frozen 172.20.0.22 n3.frozen > quorum-manager-perfmon > > ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd > > [root at n3 ~]# ps auxw | grep mm > ps auxw | grep mm > root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 > /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes > root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep > --color=auto mm > > ## Now I can mmchconfig ... while GPFS remains down. > > [root at n3 ~]# mmchconfig worker1Threads=1022 > mmchconfig worker1Threads=1022 > mmchconfig: Command successfully completed > mmchconfig: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: > mmsdrfs propagation started > Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation > completed; mmdsh rc=0 > > [root at n3 ~]# mmgetstate -a > mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 n2 down > 3 n4 unknown > 4 n5 down > 6 n3 down > > ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. > [root at n3 ~]# ping -c 1 n4 > ping -c 1 n4 > PING n4.frozen (172.20.0.23) 56(84) bytes of data. > From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable > > --- n4.frozen ping statistics --- > 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms > > [root at n3 ~]# exit > exit > logout > Connection to n3 closed. > [root at n2 gpfs-git]# ps auwx | grep mm > root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep > --color=auto mm > root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 > /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 > root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python > /usr/lpp/mmfs/bin/mmsysmon.py > [root at n2 gpfs-git]# > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 28 19:39:48 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 28 Jul 2016 14:39:48 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR vs systemd In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: My experiments show that any of the mmXXX commands that require ccr will start ccr and sdrserv. So unless you have a daeamon actively seeking and killing ccr, I don't see why systemd is a problem. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jul 28 19:44:28 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 28 Jul 2016 18:44:28 +0000 Subject: [gpfsug-discuss] CCR troubles - CCR vs systemd In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB0629615F@CHI-EXCHANGEW1.w2k.jumptrading.com> Yeah, not sure why yet but when I shutdown the cluster using our Systemd configuration this kills the daemons, but mmshutdown obviously doesn't. I'll dig into my problems with that. Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, July 28, 2016 1:40 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR vs systemd My experiments show that any of the mmXXX commands that require ccr will start ccr and sdrserv. So unless you have a daeamon actively seeking and killing ccr, I don't see why systemd is a problem. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 28 21:16:30 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 28 Jul 2016 16:16:30 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR vs systemd In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB0629615F@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629615F@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: Allow me to restate and demonstrate: Even if systemd or any explicit kill signals destroy any/all running mmcr* and mmsdr* processes, simply running mmlsconfig will fire up new mmcr* and mmsdr* processes. For example: ## I used kill -9 to kill all mmccr, mmsdr, lxtrace, ... processes [root at n2 gpfs-git]# ps auwx | grep mm root 9891 0.0 0.0 112640 980 pts/1 S+ 12:57 0:00 grep --color=auto mm [root at n2 gpfs-git]# mmlsconfig Configuration data for cluster madagascar.frozen: ------------------------------------------------- clusterName madagascar.frozen ... worker1Threads 1022 adminMode central File systems in cluster madagascar.frozen: ------------------------------------------ /dev/mak /dev/x1 /dev/yy /dev/zz ## mmlsconfig "needs" ccr and sdrserv, so if it doesn't see them, it restarts them! [root at n2 gpfs-git]# ps auwx | grep mm root 9929 0.0 0.0 114376 1696 pts/1 S 12:58 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 10110 0.0 0.0 20536 128 ? Ss 12:58 0:00 /usr/lpp/mmfs/bin/lxtrace-3.10.0-123.el7.x86_64 on /tmp/mmfs/lxtrac root 10125 0.0 0.0 493264 11064 ? Ssl 12:58 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 10358 0.0 0.0 1700488 17636 ? Sl 12:58 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py root 10440 0.0 0.0 114376 804 pts/1 S 12:59 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 10442 0.0 0.0 112640 976 pts/1 S+ 12:59 0:00 grep --color=auto mm -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Jul 28 22:29:22 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 28 Jul 2016 17:29:22 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR vs systemd In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629615F@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <83afcc2a-699a-d0b8-4f89-5e9dd7d3370e@nasa.gov> Hi Marc, I've seen systemd be overly helpful (read: not at all helpful) when it observes state changing outside of its control. There was a bug I encountered with GPFS (although the real issue may have been systemd, but the fix was put into GPFS) by which GPFS filesystems would get unmounted a split second after they were mounted, by systemd. The fs would mount but systemd decided the /dev/$fs device wasn't "ready" so it helpfully unmounted the filesystem. I don't know much about systemd (avoiding it) but based on my experience with it I could certainly see a case where systemd may actively kill the sdrserv process shortly after it's started by the mm* commands if systemd doesn't expect it to be running. I'd be curious to see the output of /var/adm/ras/mmsdrserv.log from the manager nodes to see if sdrserv is indeed starting but getting harpooned by systemd. -Aaron On 7/28/16 4:16 PM, Marc A Kaplan wrote: > Allow me to restate and demonstrate: > > Even if systemd or any explicit kill signals destroy any/all running > mmcr* and mmsdr* processes, > > simply running mmlsconfig will fire up new mmcr* and mmsdr* processes. > For example: > > ## I used kill -9 to kill all mmccr, mmsdr, lxtrace, ... processes > > [root at n2 gpfs-git]# ps auwx | grep mm > root 9891 0.0 0.0 112640 980 pts/1 S+ 12:57 0:00 grep > --color=auto mm > > [root at n2 gpfs-git]# mmlsconfig > Configuration data for cluster madagascar.frozen: > ------------------------------------------------- > clusterName madagascar.frozen > ... > worker1Threads 1022 > adminMode central > > File systems in cluster madagascar.frozen: > ------------------------------------------ > /dev/mak > /dev/x1 > /dev/yy > /dev/zz > > ## mmlsconfig "needs" ccr and sdrserv, so if it doesn't see them, it > restarts them! > > [root at n2 gpfs-git]# ps auwx | grep mm > root 9929 0.0 0.0 114376 1696 pts/1 S 12:58 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 10110 0.0 0.0 20536 128 ? Ss 12:58 0:00 > /usr/lpp/mmfs/bin/lxtrace-3.10.0-123.el7.x86_64 on /tmp/mmfs/lxtrac > root 10125 0.0 0.0 493264 11064 ? Ssl 12:58 0:00 > /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 > root 10358 0.0 0.0 1700488 17636 ? Sl 12:58 0:00 python > /usr/lpp/mmfs/bin/mmsysmon.py > root 10440 0.0 0.0 114376 804 pts/1 S 12:59 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 10442 0.0 0.0 112640 976 pts/1 S+ 12:59 0:00 grep > --color=auto mm > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 29 16:56:14 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 29 Jul 2016 15:56:14 +0000 Subject: [gpfsug-discuss] mmchqos and already running maintenance commands Message-ID: <732CF329-476A-4C96-A3CA-6E35620A548E@vanderbilt.edu> Hi All, Looking for a little clarification here ? in the man page for mmchqos I see: * When you change allocations or mount the file system, a brief delay due to reconfiguration occurs before QoS starts applying allocations. If I?m already running a maintenance command and then I run an mmchqos does that mean that the already running maitenance command will adjust to the new settings or does this only apply to subsequently executed maintenance commands? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jul 29 18:18:22 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 29 Jul 2016 17:18:22 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2.1 Released Message-ID: <5E104D88-1A80-4FF2-B721-D0BF4B930CCE@nuance.com> Version 4.2.1 is out on Fix Central and has a bunch of new features and improvements, many of which have been discussed at recent user group meetings. What's new: http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.1/com.ibm.spectrum.scale.v4r21.doc/bl1xx_soc.htm Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Jul 29 18:57:31 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 29 Jul 2016 13:57:31 -0400 Subject: [gpfsug-discuss] mmchqos and already running maintenance commands In-Reply-To: <732CF329-476A-4C96-A3CA-6E35620A548E@vanderbilt.edu> References: <732CF329-476A-4C96-A3CA-6E35620A548E@vanderbilt.edu> Message-ID: mmchqos fs --enable ... maintenance=1234iops ... Will apply the new settings to all currently running and future maintenance commands. There is just a brief delay (I think it is well under 30 seconds) for the new settings to be propagated and become effective on each node. You can use `mmlsqos fs --seconds 70` to observe performance. Better, install gnuplot and run samples/charts/qosplot.pl or hack the script to push the data into your favorite plotter. --marc From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/29/2016 11:57 AM Subject: [gpfsug-discuss] mmchqos and already running maintenance commands Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Looking for a little clarification here ? in the man page for mmchqos I see: * When you change allocations or mount the file system, a brief delay due to reconfiguration occurs before QoS starts applying allocations. If I?m already running a maintenance command and then I run an mmchqos does that mean that the already running maitenance command will adjust to the new settings or does this only apply to subsequently executed maintenance commands? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Fri Jul 1 11:32:13 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Fri, 1 Jul 2016 10:32:13 +0000 Subject: [gpfsug-discuss] Trapped Inodes Message-ID: Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From makaplan at us.ibm.com Fri Jul 1 17:29:31 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 1 Jul 2016 12:29:31 -0400 Subject: [gpfsug-discuss] Trapped Inodes In-Reply-To: References: Message-ID: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Sat Jul 2 11:05:34 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Sat, 2 Jul 2016 10:05:34 +0000 Subject: [gpfsug-discuss] Trapped Inodes Message-ID: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Sat Jul 2 20:16:55 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sat, 2 Jul 2016 15:16:55 -0400 Subject: [gpfsug-discuss] Trapped Inodes In-Reply-To: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> References: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Message-ID: I have been informed that it is possible that a glitch can occur (for example an abrupt shutdown) which can leave you in a situation where it looks like all snapshots are deleted, but there is still a hidden snapshot that must be cleaned up... The workaround is to create a snapshot `mmcrsnapshot fs dummy` and then delete it `mmdelsnapshot fs dummy` and see if that clears up the situation... --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/02/2016 06:05 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Sun Jul 3 11:32:24 2016 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Sun, 3 Jul 2016 12:32:24 +0200 Subject: [gpfsug-discuss] Improving Testing Efficiency with IBM Spectrum Scale for Automated Driving Message-ID: In the press today: "Tesla Autopilot partner Mobileye comments on fatal crash, says tech isn?t meant to avoid this type of accident." http://electrek.co/2016/07/01/tesla-autopilot-mobileye-fatal-crash-comment/ "Tesla?s autopilot system was designed in-house and uses a fusion of dozens of internally- and externally-developed component technologies to determine the proper course of action in a given scenario. Since January 2016, Autopilot activates automatic emergency braking in response to any interruption of the ground plane in the path of the vehicle that cross-checks against a consistent radar signature. In the case of this accident, the high, white side of the box truck, combined with a radar signature that would have looked very similar to an overhead sign, caused automatic braking not to fire.? More testing is needed ! Finding a way to improve ADAS/AD testing throughput by factor. more HiL tests would have better helped to avoid this accident I guess, as white side box trucks are very common on the roads arent't they? So another strong reason to use GPFS/SpectrumScale/ESS filesystems to provide video files to paralell HiL stations for testing and verification using IBM AREMA for Automotive as essence system in order to find the relevant test cases. Facts: Currently most of the testing is done by copying large video files from some kind of "slow" NAS filer to the HiL stations and running the HiL test case from the internal HiL disks. A typical HiL test run takes 7-9min while the copy alone takes an additional 3-5 min upfront depending on the setup. Together with IBM partner SVA we tested to stream these video files from a ESS GL6 directly to the HiL stations without to copy them first. This worked well and the latency was fine and stable. As a result we could improve the number of HiL test cases per month by a good factor without adding more HiL hardware. See my presentation from the GPFS User Day at SPXXL 2016 in Garching February 17th 2016 9:00 - 9:30 Improving Testing Efficiency with IBM Spectrum Scale for Automated Driving https://www.spxxl.org/sites/default/files/GPFS-AREMA-TSM_et_al_for_ADAS_AD_Testing-Feb2016.pdf More: http://electrek.co/2015/10/14/tesla-reveals-all-the-details-of-its-autopilot-and-its-software-v7-0-slide-presentation-and-audio-conference/ -frank- P.S. HiL = Hardware in the Loop https://en.wikipedia.org/wiki/Hardware-in-the-loop_simulation Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Am Weiher 24, 65451 Kelsterbach mailto:kraemerf at de.ibm.com voice: +49-(0)171-3043699 / +4970342741078 IBM Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Sun Jul 3 15:55:26 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Sun, 3 Jul 2016 14:55:26 +0000 Subject: [gpfsug-discuss] Trapped Inodes In-Reply-To: References: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Message-ID: Hi Marc, Thanks for that suggestions. This seems to have removed the NULL fileset from the list, however mmdf now shows even more strange statistics: Inode Information ----------------- Total number of used inodes in all Inode spaces: -103900000 Total number of free inodes in all Inode spaces: -24797856 Total number of allocated inodes in all Inode spaces: -128697856 Total of Maximum number of inodes in all Inode spaces: -103900000 Any ideas why these negative numbers are being reported? Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: 02 July 2016 20:17 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Trapped Inodes I have been informed that it is possible that a glitch can occur (for example an abrupt shutdown) which can leave you in a situation where it looks like all snapshots are deleted, but there is still a hidden snapshot that must be cleaned up... The workaround is to create a snapshot `mmcrsnapshot fs dummy` and then delete it `mmdelsnapshot fs dummy` and see if that clears up the situation... --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/02/2016 06:05 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan > wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Sun Jul 3 19:42:32 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 3 Jul 2016 14:42:32 -0400 Subject: [gpfsug-discuss] Trapped Inodes - releases, now count them properly! In-Reply-To: References: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Message-ID: mmdf statistics are not real-time accurate, there is a trade off in accuracy vs the cost of polling each node that might have the file system mounted. That said, here are some possibilities, in increasing order of impact on users and your possible desperation ;-) A1. Wait a while (at most a few minutes) and see if the mmdf stats are updated. A2. mmchmgr fs another-node may force new stats to be sent to the new fs manager. (Not sure but I expect it will.) B. Briefly quiesce the file system with: mmfsctl fs suspend; mmfsctl fs resume; C. If you have no users active ... I'm pretty sure mmumount fs -a ; mmmount fs -a; will clear the problem ... but there's always D. mmshutdown -a ; mmstartup -a E. If none of those resolve the situation something is hosed -- F. hope that mmfsck can fix it. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/03/2016 10:55 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Marc, Thanks for that suggestions. This seems to have removed the NULL fileset from the list, however mmdf now shows even more strange statistics: Inode Information ----------------- Total number of used inodes in all Inode spaces: -103900000 Total number of free inodes in all Inode spaces: -24797856 Total number of allocated inodes in all Inode spaces: -128697856 Total of Maximum number of inodes in all Inode spaces: -103900000 Any ideas why these negative numbers are being reported? Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: 02 July 2016 20:17 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Trapped Inodes I have been informed that it is possible that a glitch can occur (for example an abrupt shutdown) which can leave you in a situation where it looks like all snapshots are deleted, but there is still a hidden snapshot that must be cleaned up... The workaround is to create a snapshot `mmcrsnapshot fs dummy` and then delete it `mmdelsnapshot fs dummy` and see if that clears up the situation... --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/02/2016 06:05 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Mon Jul 4 10:44:02 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Mon, 4 Jul 2016 09:44:02 +0000 Subject: [gpfsug-discuss] Trapped Inodes - releases, now count them properly! In-Reply-To: References: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Message-ID: Hi Marc, Thanks again for the suggestions. An interesting report in the log while the another node took over managing the filesystem: Mon Jul 4 10:24:08.616 2016: [W] Inode space 10 in file system gpfs is approaching the limit for the maximum number of inodes. Inode space 10 was the independent fileset that the snapshot creation/deletion managed to remove. Still getting negative inode numbers reported after migrating manager functions and suspending/resuming the file system: Inode Information ----------------- Total number of used inodes in all Inode spaces: -103900000 Total number of free inodes in all Inode spaces: -24797856 Total number of allocated inodes in all Inode spaces: -128697856 Total of Maximum number of inodes in all Inode spaces: -103900000 I?ll have to wait until later today to try unmounting, daemon recycle or mmfsck. Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: 03 July 2016 19:43 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Trapped Inodes - releases, now count them properly! mmdf statistics are not real-time accurate, there is a trade off in accuracy vs the cost of polling each node that might have the file system mounted. That said, here are some possibilities, in increasing order of impact on users and your possible desperation ;-) A1. Wait a while (at most a few minutes) and see if the mmdf stats are updated. A2. mmchmgr fs another-node may force new stats to be sent to the new fs manager. (Not sure but I expect it will.) B. Briefly quiesce the file system with: mmfsctl fs suspend; mmfsctl fs resume; C. If you have no users active ... I'm pretty sure mmumount fs -a ; mmmount fs -a; will clear the problem ... but there's always D. mmshutdown -a ; mmstartup -a E. If none of those resolve the situation something is hosed -- F. hope that mmfsck can fix it. --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/03/2016 10:55 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Marc, Thanks for that suggestions. This seems to have removed the NULL fileset from the list, however mmdf now shows even more strange statistics: Inode Information ----------------- Total number of used inodes in all Inode spaces: -103900000 Total number of free inodes in all Inode spaces: -24797856 Total number of allocated inodes in all Inode spaces: -128697856 Total of Maximum number of inodes in all Inode spaces: -103900000 Any ideas why these negative numbers are being reported? Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: 02 July 2016 20:17 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Trapped Inodes I have been informed that it is possible that a glitch can occur (for example an abrupt shutdown) which can leave you in a situation where it looks like all snapshots are deleted, but there is still a hidden snapshot that must be cleaned up... The workaround is to create a snapshot `mmcrsnapshot fs dummy` and then delete it `mmdelsnapshot fs dummy` and see if that clears up the situation... --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/02/2016 06:05 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan > wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Tue Jul 5 15:25:06 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Tue, 5 Jul 2016 14:25:06 +0000 Subject: [gpfsug-discuss] Samba Export Anomalies Message-ID: Hi All, I'm having a frustrating time exporting an Independent Writer AFM fileset through Samba. Native GPFS directories exported through Samba seem to work properly, but when creating an export which points to an AFM IW fileset, I get "Access Denied" errors when trying to create files from an SMB client and even more unusual "Failed to enumerate objects in the container: Access is denied." messages if I try to modify the Access Control Entries through a Windows client. Here is the smb.conf file: ***[BEGIN smb.conf]*** [global] idmap config * : backend = autorid idmap config * : range = 100000-999999 idmap config THECRICK : backend = ad idmap config THECRICK : schema_mode = rfc2307 idmap config THECRICK : range = 30000000-31999999 local master = no realm = THECRICK.ORG security = ADS aio read size = 1 aio write size = 1 async smb echo handler = yes clustering = yes ctdbd socket = /var/run/ctdb/ctdbd.socket ea support = yes force unknown acl user = yes level2 oplocks = no log file = /var/log/samba/log.%m log level = 3 map hidden = yes map readonly = no netbios name = MS_GENERAL printcap name = /etc/printcap printing = lprng server string = Samba Server Version %v socket options = TCP_NODELAY SO_KEEPALIVE TCP_KEEPCNT=4 TCP_KEEPIDLE=240 TCP_KEEPINTVL=15 store dos attributes = yes strict allocate = yes strict locking = no unix extensions = no vfs objects = shadow_copy2 syncops fileid streams_xattr gpfs gpfs:dfreequota = yes gpfs:hsm = yes gpfs:leases = yes gpfs:prealloc = yes gpfs:sharemodes = yes gpfs:winattr = yes nfs4:acedup = merge nfs4:chown = yes nfs4:mode = simple notify:inotify = yes shadow:fixinodes = yes shadow:format = @GMT-%Y.%m.%d-%H.%M.%S shadow:snapdir = .snapshots shadow:snapdirseverywhere = yes shadow:sort = desc smbd:backgroundqueue = false smbd:search ask sharemode = false syncops:onmeta = no workgroup = THECRICK winbind enum groups = yes winbind enum users = yes [production_rw] comment = Production writable path = /general/production read only = no [stp-test] comment = STP Test Export path = /general/export/stp/stp-test read-only = no ***[END smb.conf]*** The [production_rw] export is a test directory on the /general filesystem which works from an SMB client. The [stp-test] export is an AFM fileset on the /general filesystem which is a cache of a directory in another GPFS filesystem: ***[BEGIN mmlsfileset general crick.general.export.stp.stp-test --afm -L]*** Attributes for fileset crick.general.export.stp.stp-test: ========================================================== Status Linked Path /general/export/stp/stp-test Id 1 Root inode 1048579 Parent Id 0 Created Fri Jul 1 15:56:48 2016 Comment Inode space 1 Maximum number of inodes 200000 Allocated inodes 100000 Permission change flag chmodAndSetacl afm-associated Yes Target gpfs:///camp/stp/stp-test Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 4 Prefetch Threshold 0 (default) Eviction Enabled yes (default) ***[END mmlsfileset general crick.general.export.stp.stp-test --afm -L]*** Anyone spot any glaringly obvious misconfigurations? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From bbanister at jumptrading.com Tue Jul 5 15:58:35 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 5 Jul 2016 14:58:35 +0000 Subject: [gpfsug-discuss] nosmap parameter for RHEL7 x86_64 on Haswell/Broadwell? In-Reply-To: <565240ad49e6476da9c1d3d11312f88c@mbxpsc1.winmail.deshaw.com> References: <565240ad49e6476da9c1d3d11312f88c@mbxpsc1.winmail.deshaw.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB061B491A@CHI-EXCHANGEW1.w2k.jumptrading.com> Wanted to comment that we also hit this issue and agree with Paul that it would be nice in the FAQ to at least have something like the vertical bars that denote changed or added lines in a document, which are seen in the GPFS Admin guides. This should make it easy to see what has changed. Would also be nice to "Follow this page" to get notifications of when the FAQ changes from my IBM Knowledge Center account... or maybe the person that publishes the changes could announce the update on the GPFS - Announce Developer Works page. https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000001606 Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sanchez, Paul Sent: Friday, June 03, 2016 2:38 PM To: gpfsug main discussion list (gpfsug-discuss at spectrumscale.org) Subject: [gpfsug-discuss] nosmap parameter for RHEL7 x86_64 on Haswell/Broadwell? After some puzzling debugging on our new Broadwell servers, all of which slowly became brick-like upon after getting stuck starting GPFS, we discovered that this was already a known issue in the FAQ. Adding "nosmap" to the kernel command line in grub prevents SMAP from seeing the kernel-userspace memory interactions of GPFS as a reason to slowly grind all cores to a standstill, apparently spinning on stuck locks(?). (Big thanks go to RedHat for turning us on to the answer when we opened a case.) >From https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html, section 3.2: Note: In order for IBM Spectrum Scale on RHEL 7 to run on the Haswell processor * Disable the Supervisor Mode Access Prevention (smap) kernel parameter * Reboot the RHEL 7 node before using GPFS Some observations worth noting: 1. We've been running for a year with Haswell processors and have hundreds of Haswell RHEL7 nodes which do not exhibit this problem. So maybe this only really affects Broadwell CPUs? 2. It would be very nice for SpectrumScale to take a peek at /proc/cpuinfo and /proc/cmdline before starting up, and refuse to break the host when it has affected processors and kernel without "nosmap". Instead, an error message describing the fix would have made my day. 3. I'm going to have to start using a script to diff the FAQ for these gotchas, unless anyone knows of a better way to subscribe just to updates to this doc. Thanks, Paul Sanchez ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From duersch at us.ibm.com Tue Jul 5 19:31:28 2016 From: duersch at us.ibm.com (Steve Duersch) Date: Tue, 5 Jul 2016 14:31:28 -0400 Subject: [gpfsug-discuss] nosmap parameter for RHEL7 x86_64 on Haswell/Broadwell? FAQ Updates Message-ID: The PDF version of the FAQ ( http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/gpfsclustersfaq.pdf ) does have change bars. Also at the top it lists the questions that have been changed. Your suggestion for "announcing" new faq version does make sense and I'll email the one responsible for posting the faq. Thank you. Steve Duersch Spectrum Scale (GPFS) FVTest 845-433-7902 IBM Poughkeepsie, New York Message: 2 Date: Tue, 5 Jul 2016 14:58:35 +0000 From: Bryan Banister To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] nosmap parameter for RHEL7 x86_64 on Haswell/Broadwell? Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB061B491A at CHI-EXCHANGEW1.w2k.jumptrading.com> Content-Type: text/plain; charset="us-ascii" Wanted to comment that we also hit this issue and agree with Paul that it would be nice in the FAQ to at least have something like the vertical bars that denote changed or added lines in a document, which are seen in the GPFS Admin guides. This should make it easy to see what has changed. Would also be nice to "Follow this page" to get notifications of when the FAQ changes from my IBM Knowledge Center account... or maybe the person that publishes the changes could announce the update on the GPFS - Announce Developer Works page. https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000001606 Cheers, -Bryan -------------- next part -------------- An HTML attachment was scrubbed... URL: From konstantin.arnold at unibas.ch Tue Jul 5 19:53:03 2016 From: konstantin.arnold at unibas.ch (Konstantin Arnold) Date: Tue, 5 Jul 2016 20:53:03 +0200 Subject: [gpfsug-discuss] Samba Export Anomalies In-Reply-To: References: Message-ID: <577C020F.2080507@unibas.ch> Hi Luke, probably I don't have enough information about your AFM setup but maybe you could check the ACLs on the export as well as ACLs on the directory to be mounted. If you are using AFM from a home location that has ACLs set then they will also be transferred to cache location. (We ran into similar issues when we had to take over data from a SONAS system that was assigning gid_numbers from an internal mapping table - all had to be cleaned up first before clients could have access through our CES system.) Best Konstantin On 07/05/2016 04:25 PM, Luke Raimbach wrote: > Hi All, > > I'm having a frustrating time exporting an Independent Writer AFM fileset through Samba. > > Native GPFS directories exported through Samba seem to work properly, but when creating an export which points to an AFM IW fileset, I get "Access Denied" errors when trying to create files from an SMB client and even more unusual "Failed to enumerate objects in the container: Access is denied." messages if I try to modify the Access Control Entries through a Windows client. > > Here is the smb.conf file: > > ***[BEGIN smb.conf]*** > > [global] > idmap config * : backend = autorid > idmap config * : range = 100000-999999 > idmap config THECRICK : backend = ad > idmap config THECRICK : schema_mode = rfc2307 > idmap config THECRICK : range = 30000000-31999999 > local master = no > realm = THECRICK.ORG > security = ADS > aio read size = 1 > aio write size = 1 > async smb echo handler = yes > clustering = yes > ctdbd socket = /var/run/ctdb/ctdbd.socket > ea support = yes > force unknown acl user = yes > level2 oplocks = no > log file = /var/log/samba/log.%m > log level = 3 > map hidden = yes > map readonly = no > netbios name = MS_GENERAL > printcap name = /etc/printcap > printing = lprng > server string = Samba Server Version %v > socket options = TCP_NODELAY SO_KEEPALIVE TCP_KEEPCNT=4 TCP_KEEPIDLE=240 TCP_KEEPINTVL=15 > store dos attributes = yes > strict allocate = yes > strict locking = no > unix extensions = no > vfs objects = shadow_copy2 syncops fileid streams_xattr gpfs > gpfs:dfreequota = yes > gpfs:hsm = yes > gpfs:leases = yes > gpfs:prealloc = yes > gpfs:sharemodes = yes > gpfs:winattr = yes > nfs4:acedup = merge > nfs4:chown = yes > nfs4:mode = simple > notify:inotify = yes > shadow:fixinodes = yes > shadow:format = @GMT-%Y.%m.%d-%H.%M.%S > shadow:snapdir = .snapshots > shadow:snapdirseverywhere = yes > shadow:sort = desc > smbd:backgroundqueue = false > smbd:search ask sharemode = false > syncops:onmeta = no > workgroup = THECRICK > winbind enum groups = yes > winbind enum users = yes > > [production_rw] > comment = Production writable > path = /general/production > read only = no > > [stp-test] > comment = STP Test Export > path = /general/export/stp/stp-test > read-only = no > > ***[END smb.conf]*** > > > The [production_rw] export is a test directory on the /general filesystem which works from an SMB client. The [stp-test] export is an AFM fileset on the /general filesystem which is a cache of a directory in another GPFS filesystem: > > > ***[BEGIN mmlsfileset general crick.general.export.stp.stp-test --afm -L]*** > > Attributes for fileset crick.general.export.stp.stp-test: > ========================================================== > Status Linked > Path /general/export/stp/stp-test > Id 1 > Root inode 1048579 > Parent Id 0 > Created Fri Jul 1 15:56:48 2016 > Comment > Inode space 1 > Maximum number of inodes 200000 > Allocated inodes 100000 > Permission change flag chmodAndSetacl > afm-associated Yes > Target gpfs:///camp/stp/stp-test > Mode independent-writer > File Lookup Refresh Interval 30 (default) > File Open Refresh Interval 30 (default) > Dir Lookup Refresh Interval 60 (default) > Dir Open Refresh Interval 60 (default) > Async Delay 15 (default) > Last pSnapId 0 > Display Home Snapshots no > Number of Gateway Flush Threads 4 > Prefetch Threshold 0 (default) > Eviction Enabled yes (default) > > ***[END mmlsfileset general crick.general.export.stp.stp-test --afm -L]*** > > > Anyone spot any glaringly obvious misconfigurations? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, > The Francis Crick Institute, > Gibbs Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From r.sobey at imperial.ac.uk Wed Jul 6 10:37:29 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 09:37:29 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> Message-ID: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Jul 6 10:47:16 2016 From: bevans at pixitmedia.com (Barry Evans) Date: Wed, 6 Jul 2016 10:47:16 +0100 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> Message-ID: <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: > > Quick followup on this. Doing some more samba debugging (i.e. > increasing log levels!) and come up with the following: > > [2016/07/06 10:07:35.602080, 3] > ../source3/smbd/vfs.c:1322(check_reduced_name) > > check_reduced_name: > admin/ict/serviceoperations/slough_project/Slough_Layout reduced to > /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout > > [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) > > unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) > returning 0644 > > [2016/07/06 10:07:35.613374, 0] > ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) > > * user does not have list permission on snapdir > /gpfs/prd/groupspace/ic/admin/ict/.snapshots* > > [2016/07/06 10:07:35.613416, 0] > ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) > > access denied on listing snapdir > /gpfs/prd/groupspace/ic/admin/ict/.snapshots > > [2016/07/06 10:07:35.613434, 0] > ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) > > FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, > failed - NT_STATUS_ACCESS_DENIED. > > [2016/07/06 10:07:47.648557, 3] > ../source3/smbd/service.c:1138(close_cnum) > > 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to > service IPC$ > > Any takers? I cannot run mmgetacl on the .snapshots folder at all, as > root. A snapshot I just created to make sure I had full control on the > folder: (39367 is me, I didn?t run this command on a CTDB node so the > UID mapping isn?t working). > > [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 > > #NFSv4 ACL > > #owner:root > > #group:root > > group:74036:r-x-:allow:FileInherit:DirInherit:Inherited > > (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL > (X)READ_ATTR (X)READ_NAMED > > (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL > (-)WRITE_ATTR (-)WRITE_NAMED > > user:39367:rwxc:allow:FileInherit:DirInherit:Inherited > > (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL > (X)READ_ATTR (X)READ_NAMED > > (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL > (X)WRITE_ATTR (X)WRITE_NAMED > > *From:*gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] *On Behalf Of > *Sobey, Richard A > *Sent:* 20 June 2016 16:03 > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but > our customers have come to like previous versions and indeed it is > sort of a selling point for us. > > Samba is the only thing we?ve changed recently after the badlock > debacle so I?m tempted to blame that, but who knows. > > If (when) I find out I?ll let everyone know. > > Richard > > *From:*gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] *On Behalf Of > *Buterbaugh, Kevin L > *Sent:* 20 June 2016 15:56 > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Hi Richard, > > I can?t answer your question but I can tell you that we have > experienced either the exact same thing you are or something very > similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 > and it persists even after upgraded to GPFS 4.2.0.3 and the very > latest sernet-samba. > > And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* > upgrade SAMBA versions at that time. Therefore, I believe that > something changed in GPFS. That doesn?t mean it?s GPFS? fault, of > course. SAMBA may have been relying on a > bugundocumented feature in GPFS that IBM fixed > for all I know, and I?m obviously speculating here. > > The problem we see is that the .snapshots directory in each folder can > be cd?d to but is empty. The snapshots are all there, however, if you: > > cd //.snapshots/ taken>/rest/of/path/to/folder/in/question > > This obviously prevents users from being able to do their own recovery > of files unless you do something like what you describe, which we are > unwilling to do for security reasons. We have a ticket open with DDN? > > Kevin > > On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > > wrote: > > Hi all > > Can someone clarify if the ability for Windows to view snapshots > as Previous Versions is exposed by SAMBA or GPFS? Basically, if > suddenly my users cannot restore files from snapshots over a CIFS > share, where should I be looking? > > I don?t know when this problem occurred, but within the last few > weeks certainly our users with full control over their data now > see no previous versions available, but if we export their fileset > and set ?force user = root? all the snapshots are available. > > I think the answer is SAMBA, right? We?re running GPFS 3.5 and > sernet-samba 4.2.9. > > Many thanks > > Richard > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss atspectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ? > > Kevin Buterbaugh - Senior System Administrator > > Vanderbilt University - Advanced Computing Center for Research and > Education > > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 6 10:55:14 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 09:55:14 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn't run this command on a CTDB node so the UID mapping isn't working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we've changed recently after the badlock debacle so I'm tempted to blame that, but who knows. If (when) I find out I'll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can't answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn't mean it's GPFS' fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I'm obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd'd to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN... Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don't know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set "force user = root" all the snapshots are available. I think the answer is SAMBA, right? We're running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss - Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com [http://pixitmedia.com/sig/sig-cio.jpg] This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Wed Jul 6 12:50:56 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 6 Jul 2016 11:50:56 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 6 13:22:53 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 12:22:53 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch, (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jul 6 15:45:57 2016 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 6 Jul 2016 07:45:57 -0700 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From r.sobey at imperial.ac.uk Wed Jul 6 15:54:25 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 14:54:25 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jul 6 16:21:06 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 6 Jul 2016 15:21:06 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: <6C7203C4-27D4-40E7-979F-B186D4622BC7@vanderbilt.edu> Hi Richard, Is that a typo in the version? We?re also using Sernet Samba but we?ve got 4.3.9? Kevin On Jul 6, 2016, at 9:54 AM, Sobey, Richard A > wrote: Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 6 16:23:16 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 15:23:16 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: <6C7203C4-27D4-40E7-979F-B186D4622BC7@vanderbilt.edu> References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> <6C7203C4-27D4-40E7-979F-B186D4622BC7@vanderbilt.edu> Message-ID: I?m afraid it?s not a typo ? [root at server gpfs]# rpm -qa | grep sernet sernet-samba-ctdb-tests-4.2.9-19.el6.x86_64 sernet-samba-common-4.2.9-19.el6.x86_64 sernet-samba-winbind-4.2.9-19.el6.x86_64 sernet-samba-ad-4.2.9-19.el6.x86_64 sernet-samba-libs-4.2.9-19.el6.x86_64 sernet-samba-4.2.9-19.el6.x86_64 sernet-samba-libsmbclient-devel-4.2.9-19.el6.x86_64 sernet-samba-libsmbclient0-4.2.9-19.el6.x86_64 sernet-samba-ctdb-4.2.9-19.el6.x86_64 sernet-samba-libwbclient-devel-4.2.9-19.el6.x86_64 sernet-samba-client-4.2.9-19.el6.x86_64 sernet-samba-debuginfo-4.2.9-19.el6.x86_64 From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 06 July 2016 16:21 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, Is that a typo in the version? We?re also using Sernet Samba but we?ve got 4.3.9? Kevin On Jul 6, 2016, at 9:54 AM, Sobey, Richard A > wrote: Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" > To: gpfsug main discussion list > Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 6 16:26:36 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 15:26:36 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> <6C7203C4-27D4-40E7-979F-B186D4622BC7@vanderbilt.edu> Message-ID: By the way, we are planning to go to CES / 4.2.x in a matter of weeks, but understanding this problem was quite important for me. Perhaps knowing now that the fix is probably to install a different version of Samba, we?ll probably leave it alone. Thank you everyone for your help, Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 06 July 2016 16:23 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions I?m afraid it?s not a typo ? [root at server gpfs]# rpm -qa | grep sernet sernet-samba-ctdb-tests-4.2.9-19.el6.x86_64 sernet-samba-common-4.2.9-19.el6.x86_64 sernet-samba-winbind-4.2.9-19.el6.x86_64 sernet-samba-ad-4.2.9-19.el6.x86_64 sernet-samba-libs-4.2.9-19.el6.x86_64 sernet-samba-4.2.9-19.el6.x86_64 sernet-samba-libsmbclient-devel-4.2.9-19.el6.x86_64 sernet-samba-libsmbclient0-4.2.9-19.el6.x86_64 sernet-samba-ctdb-4.2.9-19.el6.x86_64 sernet-samba-libwbclient-devel-4.2.9-19.el6.x86_64 sernet-samba-client-4.2.9-19.el6.x86_64 sernet-samba-debuginfo-4.2.9-19.el6.x86_64 From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 06 July 2016 16:21 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, Is that a typo in the version? We?re also using Sernet Samba but we?ve got 4.3.9? Kevin On Jul 6, 2016, at 9:54 AM, Sobey, Richard A > wrote: Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" > To: gpfsug main discussion list > Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hpc.ken.tw25qn at gmail.com Wed Jul 6 16:37:56 2016 From: hpc.ken.tw25qn at gmail.com (Ken Atkinson) Date: Wed, 6 Jul 2016 16:37:56 +0100 Subject: [gpfsug-discuss] =?utf-8?q?vn511i7_/_Windoqws_prevmqq1qqqqqq2qqqq?= =?utf-8?q?qqqqqqqqaqqa=C3=A0a=C3=A5io8iusk_versions?= Message-ID: 9G4HTGTB kk38?vv On 6 Jul 2016 15:46, "Christof Schmitt" wrote: > > The message in the trace confirms that this is triggered by: > https://git.samba.org/?p=samba.git;a=commitdiff;h=4 > > I 2asuspect that the Samba version used misses the patch > https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a > > The CES build of Samba shippied in Spectrum Scale includes the mentioned > patch, and that should avoid the problem seen. Would it be possible to > build Samba again with the mentioned patch to test whether that fixes the > issue seen here? > > Regards, > > Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ > christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) > > > > From: "Sobey, Richard A" > To: gpfsug main discussion list > Date: 07/06/2016 05:23 AM > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Thanks Daniel ? sorry to be dense, but does this indicate working as > intended, or a bug? I assume the former. So, the question still remains > how has this suddenly broken, when: > > [root at server ict]# mmgetacl -k nfs4 .snapshots/ > .snapshots/: Operation not permitted > > ?appears to be the correct output and is consistent with someone else?s > GPFS cluster where it is working. > > Cheers > > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org [ > mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel > Kidger > Sent: 06 July 2016 12:51 > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Looking at recent patches to SAMBA I see from December 2015: > https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch > , > (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which > includes the comment: > Failing that, smbd_check_access_rights should check Unix perms at that > point. > ) > > diff --git a/source3/modules/vfs_shadow_copy2.c > b/source3/modules/vfs_shadow_copy2.c > index fca05cf..07e2f8a 100644 > --- a/source3/modules/vfs_shadow_copy2.c > +++ b/source3/modules/vfs_shadow_copy2.c > @@ -30,6 +30,7 @@ > */ > > #include "includes.h" > +#include "smbd/smbd.h" > #include "system/filesys.h" > #include "include/ntioctl.h" > #include > @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct > *handle, > return NULL; > } > > +static bool check_access_snapdir(struct vfs_handle_struct *handle, > + const char *path) > +{ > + struct smb_filename smb_fname; > + int ret; > + NTSTATUS status; > + > + ZERO_STRUCT(smb_fname); > + smb_fname.base_name = talloc_asprintf(talloc_tos(), > + "%s", > + path); > + if (smb_fname.base_name == NULL) { > + return false; > + } > + > + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); > + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { > + TALLOC_FREE(smb_fname.base_name); > + return false; > + } > + > + status = smbd_check_access_rights(handle->conn, > + &smb_fname, > + false, > + SEC_DIR_LIST); > + if (!NT_STATUS_IS_OK(status)) { > + DEBUG(0,("user does not have list permission " > + "on snapdir %s\n", > + smb_fname.base_name)); > + TALLOC_FREE(smb_fname.base_name); > + return false; > + } > + TALLOC_FREE(smb_fname.base_name); > + return true; > +} > + > > Daniel > > > > > > Dr Daniel Kidger > IBM Technical Sales Specialist > Software Defined Solution Sales > > +44-07818 522 266 > daniel.kidger at uk.ibm.com > > > > > > > ----- Original message ----- > From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > Date: Wed, Jul 6, 2016 10:55 AM > > Sure. It might be easier if I just post the entire smb.conf: > > [global] > netbios name = store > workgroup = IC > security = ads > realm = IC.AC.UK > kerberos method = secrets and keytab > > vfs objects = shadow_copy2 syncops gpfs fileid > ea support = yes > store dos attributes = yes > map readonly = no > map archive = no > map system = no > map hidden = no > unix extensions = no > allocation roundup size = 1048576 > > disable netbios = yes > smb ports = 445 > # server signing = mandatory > > template shell = /bin/bash > interfaces = bond2 lo bond0 > allow trusted domains = no > > printing = bsd > printcap name = /dev/null > load printers = no > disable spoolss = yes > > idmap config IC : default = yes > idmap config IC : cache time = 180 > idmap config IC : backend = ad > idmap config IC : schema_mode = rfc2307 > idmap config IC : range = 500 - 2000000 > idmap config * : range = 3000000 - 3500000 > idmap config * : backend = tdb2 > winbind refresh tickets = yes > winbind nss info = rfc2307 > winbind use default domain = true > winbind offline logon = true > winbind separator = / > winbind enum users = true > winbind enum groups = true > winbind nested groups = yes > winbind expand groups = 2 > > winbind max clients = 10000 > > clustering = yes > ctdbd socket = /tmp/ctdb.socket > gpfs:sharemodes = yes > gpfs:winattr = yes > gpfs:leases = yes > gpfs:dfreequota = yes > # nfs4:mode = special > # nfs4:chown = no > nfs4:chown = yes > nfs4:mode = simple > > nfs4:acedup = merge > fileid:algorithm = fsname > force unknown acl user = yes > > shadow:snapdir = .snapshots > shadow:fixinodes = yes > shadow:snapdirseverywhere = yes > shadow:sort = desc > > syncops:onclose = no > syncops:onmeta = no > kernel oplocks = yes > level2 oplocks = yes > oplocks = yes > notify:inotify = no > wide links = no > async smb echo handler = yes > smbd:backgroundqueue = False > use sendfile = no > dmapi support = yes > > aio write size = 1 > aio read size = 1 > > enable core files = no > > #debug logging > log level = 2 > log file = /var/log/samba.%m > max log size = 1024 > debug timestamp = yes > > [IC] > comment = Unified Group Space Area > path = /gpfs/prd/groupspace/ic > public = no > read only = no > valid users = "@domain users" > > From: gpfsug-discuss-bounces at spectrumscale.org [ > mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans > Sent: 06 July 2016 10:47 > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Can you cut/paste your full VFS options for gpfs and shadow copy from > smb.conf? > > On 06/07/2016 10:37, Sobey, Richard A wrote: > Quick followup on this. Doing some more samba debugging (i.e. increasing > log levels!) and come up with the following: > > [2016/07/06 10:07:35.602080, 3] > ../source3/smbd/vfs.c:1322(check_reduced_name) > check_reduced_name: > admin/ict/serviceoperations/slough_project/Slough_Layout reduced to > /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout > [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) > unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) > returning 0644 > [2016/07/06 10:07:35.613374, 0] > ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) > user does not have list permission on snapdir > /gpfs/prd/groupspace/ic/admin/ict/.snapshots > [2016/07/06 10:07:35.613416, 0] > ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) > access denied on listing snapdir > /gpfs/prd/groupspace/ic/admin/ict/.snapshots > [2016/07/06 10:07:35.613434, 0] > ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) > FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed > - NT_STATUS_ACCESS_DENIED. > [2016/07/06 10:07:47.648557, 3] > ../source3/smbd/service.c:1138(close_cnum) > 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service > IPC$ > > Any takers? I cannot run mmgetacl on the .snapshots folder at all, as > root. A snapshot I just created to make sure I had full control on the > folder: (39367 is me, I didn?t run this command on a CTDB node so the UID > mapping isn?t working). > > [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 > #NFSv4 ACL > #owner:root > #group:root > group:74036:r-x-:allow:FileInherit:DirInherit:Inherited > (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL > (X)READ_ATTR (X)READ_NAMED > (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL > (-)WRITE_ATTR (-)WRITE_NAMED > > user:39367:rwxc:allow:FileInherit:DirInherit:Inherited > (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL > (X)READ_ATTR (X)READ_NAMED > (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL > (X)WRITE_ATTR (X)WRITE_NAMED > > From: gpfsug-discuss-bounces at spectrumscale.org [ > mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, > Richard A > Sent: 20 June 2016 16:03 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our > customers have come to like previous versions and indeed it is sort of a > selling point for us. > > Samba is the only thing we?ve changed recently after the badlock debacle > so I?m tempted to blame that, but who knows. > > If (when) I find out I?ll let everyone know. > > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org [ > mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, > Kevin L > Sent: 20 June 2016 15:56 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Hi Richard, > > I can?t answer your question but I can tell you that we have experienced > either the exact same thing you are or something very similar. It > occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists > even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. > > And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* > upgrade SAMBA versions at that time. Therefore, I believe that something > changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA > may have been relying on a bugundocumented feature > in GPFS that IBM fixed for all I know, and I?m obviously speculating here. > > The problem we see is that the .snapshots directory in each folder can be > cd?d to but is empty. The snapshots are all there, however, if you: > > cd //.snapshots/ taken>/rest/of/path/to/folder/in/question > > This obviously prevents users from being able to do their own recovery of > files unless you do something like what you describe, which we are > unwilling to do for security reasons. We have a ticket open with DDN? > > Kevin > > On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: > > Hi all > > Can someone clarify if the ability for Windows to view snapshots as > Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my > users cannot restore files from snapshots over a CIFS share, where should > I be looking? > > I don?t know when this problem occurred, but within the last few weeks > certainly our users with full control over their data now see no previous > versions available, but if we export their fileset and set ?force user = > root? all the snapshots are available. > > I think the answer is SAMBA, right? We?re running GPFS 3.5 and > sernet-samba 4.2.9. > > Many thanks > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Barry Evans > Technical Director & Co-Founder > Pixit Media > Mobile: +44 (0)7950 666 248 > http://www.pixitmedia.com > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jul 6 17:19:40 2016 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 6 Jul 2016 09:19:40 -0700 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: The first patch is at least in Samba 4.2 and newer. The patch to the vfs_gpfs module is only in Samba 4.3 and newer. So any of these should fix your problem: - Add the vfs_gpfs patch to the source code of Samba 4.2.9 and recompile the code. - Upgrade to Sernet Samba 4.3.x or newer - Change the Samba services to the ones provided through CES Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 07:54 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Mark.Bush at siriuscom.com Thu Jul 7 14:00:17 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 7 Jul 2016 13:00:17 +0000 Subject: [gpfsug-discuss] Migration policy confusion Message-ID: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Thu Jul 7 14:10:52 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Thu, 7 Jul 2016 13:10:52 +0000 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Jul 7 14:12:12 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 7 Jul 2016 15:12:12 +0200 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 7 14:16:19 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 7 Jul 2016 13:16:19 +0000 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: <640419CE-E989-47CD-999D-65EC249C9B8A@siriuscom.com> Olaf, thanks. Yes the plan is to have SSD?s for the system pool ultimately but this is just a test system that I?m using to try and understand teiring better. The files (10 or so of them) are each 200MB in size. Mark From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 8:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Migration policy confusion HI , first of all, given by the fact, that the MetaData is stored in system pool .. system should be the "fastest" pool / underlaying disks ... you have.. with a "slow" access to the MD, access to data is very likely affected.. (except for cached data, where MD is cached) in addition.. tell us, how "big" your test files are ? .. you moved by mmapplypolicy Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/07/2016 03:00 PM Subject: [gpfsug-discuss] Migration policy confusion Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 7 14:16:53 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 7 Jul 2016 13:16:53 +0000 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: <662B1BCF-4746-4825-A55F-A5B9361D60C4@siriuscom.com> Thanks Daniel. I did wait for 15-20 minutes after. From: on behalf of Daniel Kidger Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 8:10 AM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Migration policy confusion Mark, For performance reasons, mmdf gets its data updated asynchronously. Did you try waiting a few minutes? Daniel Error! Filename not specified. Error! Filename not specified. Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Migration policy confusion Date: Thu, Jul 7, 2016 2:00 PM Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From service at metamodul.com Thu Jul 7 14:18:41 2016 From: service at metamodul.com (- -) Date: Thu, 7 Jul 2016 15:18:41 +0200 (CEST) Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: <318576846.22999.a23b5e71-bef0-4fc7-9542-12ecb401ec9e.open-xchange@email.1und1.de> An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 7 15:20:12 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 7 Jul 2016 10:20:12 -0400 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: <662B1BCF-4746-4825-A55F-A5B9361D60C4@siriuscom.com> References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> <662B1BCF-4746-4825-A55F-A5B9361D60C4@siriuscom.com> Message-ID: At the very least, LOOK at the messages output by the mmapplypolicy command at the beginning and end. The "occupancy" stats for each pool are shown BEFORE and AFTER the command does its work. In even more detail, it shows you how many files and how many KB of data were (or will be or would be) migrated. Also, options matter. ReadTheFineManuals. -I test vs -I defer vs -I yes. To see exactly which files are being migrated, use -L 2 To see exactly which files are being selected by your rule(s), use -L 3 And for more details about the files being skipped over, etc, etc, -L 6 Gee, I just checked the doc myself, I forgot some of the details and it's pretty good. Admittedly mmapplypolicy is a complex command. You can do somethings simply, only knowing a few options and policy rules, BUT... As my father used to say, "When all else fails, read the directions!" -L n Controls the level of information displayed by the mmapplypolicy command. Larger values indicate the display of more detailed information. These terms are used: candidate file A file that matches a MIGRATE, DELETE, or LIST policy rule. chosen file A candidate file that has been scheduled for action. These are the valid values for n: 0 Displays only serious errors. 1 Displays some information as the command runs, but not for each file. This is the default. 2 Displays each chosen file and the scheduled migration or deletion action. 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. For examples and more information on this flag, see the section: The mmapplypolicy -L command in the IBM Spectrum Scale: Problem Determination Guide. --marc From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/07/2016 09:17 AM Subject: Re: [gpfsug-discuss] Migration policy confusion Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel. I did wait for 15-20 minutes after. From: on behalf of Daniel Kidger Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 8:10 AM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Migration policy confusion Mark, For performance reasons, mmdf gets its data updated asynchronously. Did you try waiting a few minutes? Daniel Error! Filename not specified. Error! Filename not specified. Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Migration policy confusion Date: Thu, Jul 7, 2016 2:00 PM Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 7 15:30:33 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 7 Jul 2016 14:30:33 +0000 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> <662B1BCF-4746-4825-A55F-A5B9361D60C4@siriuscom.com> Message-ID: <877D722D-8CF5-496F-AAE5-7C0190E54D50@siriuscom.com> Thanks all. I realized that my file creation command was building 200k size files instead of the 200MB files. I fixed that and now I see the mmapplypolicy command take a bit more time and show accurate data as well as my bytes are now on the proper NSDs. It?s always some little thing that the human messes up isn?t it? ? From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 9:20 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Migration policy confusion At the very least, LOOK at the messages output by the mmapplypolicy command at the beginning and end. The "occupancy" stats for each pool are shown BEFORE and AFTER the command does its work. In even more detail, it shows you how many files and how many KB of data were (or will be or would be) migrated. Also, options matter. ReadTheFineManuals. -I test vs -I defer vs -I yes. To see exactly which files are being migrated, use -L 2 To see exactly which files are being selected by your rule(s), use -L 3 And for more details about the files being skipped over, etc, etc, -L 6 Gee, I just checked the doc myself, I forgot some of the details and it's pretty good. Admittedly mmapplypolicy is a complex command. You can do somethings simply, only knowing a few options and policy rules, BUT... As my father used to say, "When all else fails, read the directions!" -L n Controls the level of information displayed by the mmapplypolicy command. Larger values indicate the display of more detailed information. These terms are used: candidate file A file that matches a MIGRATE, DELETE, or LIST policy rule. chosen file A candidate file that has been scheduled for action. These are the valid values for n: 0 Displays only serious errors. 1 Displays some information as the command runs, but not for each file. This is the default. 2 Displays each chosen file and the scheduled migration or deletion action. 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. For examples and more information on this flag, see the section: The mmapplypolicy -L command in the IBM Spectrum Scale: Problem Determination Guide. --marc From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/07/2016 09:17 AM Subject: Re: [gpfsug-discuss] Migration policy confusion Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Thanks Daniel. I did wait for 15-20 minutes after. From: on behalf of Daniel Kidger Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 8:10 AM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Migration policy confusion Mark, For performance reasons, mmdf gets its data updated asynchronously. Did you try waiting a few minutes? Daniel Error! Filename not specified. Error! Filename not specified. Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Migration policy confusion Date: Thu, Jul 7, 2016 2:00 PM Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Thu Jul 7 20:44:15 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Thu, 7 Jul 2016 15:44:15 -0400 Subject: [gpfsug-discuss] Introductions Message-ID: All, My name is Brian Marshall; I am a computational scientist at Virginia Tech. We have ~2PB GPFS install we are about to expand this Summer and I may have some questions along the way. Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Jul 8 03:09:30 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Jul 2016 22:09:30 -0400 Subject: [gpfsug-discuss] mmpmon gfis fields question Message-ID: <223e7955-1831-6ff5-40e6-bd534f4c2e2f@nasa.gov> Does anyone know what the fields in the mmpmon gfis output indicate? # socat /var/mmfs/mmpmon/mmpmonSocket - _event_ newconnection _t_ 1467937547 _tu_ 372882 _n_ 10.101.11.1 _node_ local_node mmpmon gfis _response_ begin mmpmon gfis _mmpmon::gfis_ _n_ 10.101.11.1 _nn_ lorej001 _rc_ 0 _t_ 1467937550 _tu_ 518265 _cl_ disguise-gpfs _fs_ thome _d_ 5 _br_ 0 _bw_ 0 _c_ 0 _r_ 0 _w_ 0 _oc_ 0 _cc_ 0 _rdc_ 0 _wc_ 0 _dir_ 0 _iu_ 0 _irc_ Here's my best guess: _d_ number of disks in the filesystem _br_ bytes read from disk _bw_ bytes written to disk _c_ cache ops _r_ read ops _w_ write ops _oc_ open() calls _cc_ close() calls _rdc_ read() calls _wc_ write() calls _dir_ readdir calls _iu_ inode update count _irc_ inode read count _idc_ inode delete count _icc_ inode create count _bc_ bytes read from cache _sch_ stat cache hits _scm_ stat cache misses This is all because the mmpmon fs_io_s command doesn't give me a way that I can find to distinguish block/stat cache hits from cache misses which makes it harder to pinpoint misbehaving applications on the system. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Fri Jul 8 03:16:19 2016 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 7 Jul 2016 19:16:19 -0700 Subject: [gpfsug-discuss] mmpmon gfis fields question In-Reply-To: <223e7955-1831-6ff5-40e6-bd534f4c2e2f@nasa.gov> References: <223e7955-1831-6ff5-40e6-bd534f4c2e2f@nasa.gov> Message-ID: Hi, this is a undocumented mmpmon call, so you are on your own, but here is the correct description : _n_ IP address of the node responding. This is the address by which GPFS knows the node. _nn_ The name by which GPFS knows the node. _rc_ The reason/error code. In this case, the reply value is 0 (OK). _t_ Current time of day in seconds (absolute seconds since Epoch (1970)). _tu_ Microseconds part of the current time of day. _cl_ The name of the cluster that owns the file system. _fs_ The name of the file system for which data are being presented. _d_ The number of disks in the file system. _br_ Total number of bytes read from disk (not counting those read from cache.) _bw_ Total number of bytes written, to both disk and cache. _c_ The total number of read operations supplied from cache. _r_ The total number of read operations supplied from disk. _w_ The total number of write operations, to both disk and cache. _oc_ Count of open() call requests serviced by GPFS. _cc_ Number of close() call requests serviced by GPFS. _rdc_ Number of application read requests serviced by GPFS. _wc_ Number of application write requests serviced by GPFS. _dir_ Number of readdir() call requests serviced by GPFS. _iu_ Number of inode updates to disk. _irc_ Number of inode reads. _idc_ Number of inode deletions. _icc_ Number of inode creations. _bc_ Number of bytes read from the cache. _sch_ Number of stat cache hits. _scm_ Number of stat cache misses. On Thu, Jul 7, 2016 at 7:09 PM, Aaron Knister wrote: > Does anyone know what the fields in the mmpmon gfis output indicate? > > # socat /var/mmfs/mmpmon/mmpmonSocket - > _event_ newconnection _t_ 1467937547 _tu_ 372882 _n_ 10.101.11.1 _node_ > local_node > mmpmon gfis > _response_ begin mmpmon gfis > _mmpmon::gfis_ _n_ 10.101.11.1 _nn_ lorej001 _rc_ 0 _t_ 1467937550 _tu_ > 518265 _cl_ disguise-gpfs _fs_ thome _d_ 5 _br_ 0 _bw_ 0 _c_ 0 _r_ 0 _w_ 0 > _oc_ 0 _cc_ 0 _rdc_ 0 _wc_ 0 _dir_ 0 _iu_ 0 _irc_ > > > Here's my best guess: > > _d_ number of disks in the filesystem > _br_ bytes read from disk > _bw_ bytes written to disk > _c_ cache ops > _r_ read ops > _w_ write ops > _oc_ open() calls > _cc_ close() calls > _rdc_ read() calls > _wc_ write() calls > _dir_ readdir calls > _iu_ inode update count > _irc_ inode read count > _idc_ inode delete count > _icc_ inode create count > _bc_ bytes read from cache > _sch_ stat cache hits > _scm_ stat cache misses > > This is all because the mmpmon fs_io_s command doesn't give me a way that > I can find to distinguish block/stat cache hits from cache misses which > makes it harder to pinpoint misbehaving applications on the system. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Jul 8 04:13:59 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Jul 2016 23:13:59 -0400 Subject: [gpfsug-discuss] mmpmon gfis fields question In-Reply-To: References: <223e7955-1831-6ff5-40e6-bd534f4c2e2f@nasa.gov> Message-ID: Ah, thank you! That's a huge help. My preference, of course, would be to use documented calls but I'm already down that rabbit hole calling nsd_ds directly b/c the snmp agent chokes and dies a horrible death with 3.5k nodes and the number of NSDs we have. On 7/7/16 10:16 PM, Sven Oehme wrote: > Hi, > > this is a undocumented mmpmon call, so you are on your own, but here is > the correct description : > > > _n_ > > > > IP address of the node responding. This is the address by which GPFS > knows the node. > > _nn_ > > > > The name by which GPFS knows the node. > > _rc_ > > > > The reason/error code. In this case, the reply value is 0 (OK). > > _t_ > > > > Current time of day in seconds (absolute seconds since Epoch (1970)). > > _tu_ > > > > Microseconds part of the current time of day. > > _cl_ > > > > The name of the cluster that owns the file system. > > _fs_ > > > > The name of the file system for which data are being presented. > > _d_ > > > > The number of disks in the file system. > > _br_ > > > > Total number of bytes read from disk (not counting those read from cache.) > > _bw_ > > > > Total number of bytes written, to both disk and cache. > > _c_ > > > > The total number of read operations supplied from cache. > > _r_ > > > > The total number of read operations supplied from disk. > > _w_ > > > > The total number of write operations, to both disk and cache. > > _oc_ > > > > Count of open() call requests serviced by GPFS. > > _cc_ > > > > Number of close() call requests serviced by GPFS. > > _rdc_ > > > > Number of application read requests serviced by GPFS. > > _wc_ > > > > Number of application write requests serviced by GPFS. > > _dir_ > > > > Number of readdir() call requests serviced by GPFS. > > _iu_ > > > > Number of inode updates to disk. > > _irc_ > > > > Number of inode reads. > > _idc_ > > > > Number of inode deletions. > > _icc_ > > > > Number of inode creations. > > _bc_ > > > > Number of bytes read from the cache. > > _sch_ > > > > Number of stat cache hits. > > _scm_ > > > > Number of stat cache misses. > > > On Thu, Jul 7, 2016 at 7:09 PM, Aaron Knister > wrote: > > Does anyone know what the fields in the mmpmon gfis output indicate? > > # socat /var/mmfs/mmpmon/mmpmonSocket - > _event_ newconnection _t_ 1467937547 _tu_ 372882 _n_ 10.101.11.1 > _node_ local_node > mmpmon gfis > _response_ begin mmpmon gfis > _mmpmon::gfis_ _n_ 10.101.11.1 _nn_ lorej001 _rc_ 0 _t_ 1467937550 > _tu_ 518265 _cl_ disguise-gpfs _fs_ thome _d_ 5 _br_ 0 _bw_ 0 _c_ 0 > _r_ 0 _w_ 0 _oc_ 0 _cc_ 0 _rdc_ 0 _wc_ 0 _dir_ 0 _iu_ 0 _irc_ > > > Here's my best guess: > > _d_ number of disks in the filesystem > _br_ bytes read from disk > _bw_ bytes written to disk > _c_ cache ops > _r_ read ops > _w_ write ops > _oc_ open() calls > _cc_ close() calls > _rdc_ read() calls > _wc_ write() calls > _dir_ readdir calls > _iu_ inode update count > _irc_ inode read count > _idc_ inode delete count > _icc_ inode create count > _bc_ bytes read from cache > _sch_ stat cache hits > _scm_ stat cache misses > > This is all because the mmpmon fs_io_s command doesn't give me a way > that I can find to distinguish block/stat cache hits from cache > misses which makes it harder to pinpoint misbehaving applications on > the system. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From mweil at wustl.edu Mon Jul 11 17:33:02 2016 From: mweil at wustl.edu (Matt Weil) Date: Mon, 11 Jul 2016 11:33:02 -0500 Subject: [gpfsug-discuss] CES sizing guide In-Reply-To: <375ba33c-894f-215f-4044-e4995761f640@wustl.edu> References: <375ba33c-894f-215f-4044-e4995761f640@wustl.edu> Message-ID: Hello all, > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Sizing%20Guidance%20for%20Protocol%20Node > > Is there any more guidance on this as one socket can be a lot of cores and memory today. > > Thanks > ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From mimarsh2 at vt.edu Tue Jul 12 14:12:17 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 12 Jul 2016 09:12:17 -0400 Subject: [gpfsug-discuss] Aggregating filesystem performance Message-ID: All, I have a Spectrum Scale 4.1 cluster serving data to 4 different client clusters (~800 client nodes total). I am looking for ways to monitor filesystem performance to uncover network bottlenecks or job usage patterns affecting performance. I received this info below from an IBM person. Does anyone have examples of aggregating mmperfmon data? Is anyone doing something different? "mmpmon does not currently aggregate cluster-wide data. As of SS 4.1.x you can look at "mmperfmon query" as well, but it also primarily only provides node specific data. The tools are built to script performance data but there aren't any current scripts available for you to use within SS (except for what might be on the SS wiki page). It would likely be something you guys would need to build, that's what other clients have done." Thank you, Brian Marshall Virginia Tech - Advanced Research Computing -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Jul 12 14:19:49 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 12 Jul 2016 13:19:49 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance Message-ID: <39E581EB-978D-4103-A2BC-FE4FF57B3608@nuance.com> Hi Brian I have a couple of pointers: - We have been running mmpmon for a while now across multiple clusters, sticking the data in external database for analysis. This has been working pretty well, but we are transitioning to (below) - SS 4.1 and later have built in zimon for collecting a wealth of performance data - this feeds into the built in GUI. But, there is bridge tools that IBM has built internally and keeps promising to release (I talked about it at the last SS user group meeting at Argonne) that allows use of Grafana with the zimon data. This is working well for us. Let me know if you want to discuss details and I will be happy to share my experiences and pointers in looking at the performance data. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Tuesday, July 12, 2016 at 9:12 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Aggregating filesystem performance All, I have a Spectrum Scale 4.1 cluster serving data to 4 different client clusters (~800 client nodes total). I am looking for ways to monitor filesystem performance to uncover network bottlenecks or job usage patterns affecting performance. I received this info below from an IBM person. Does anyone have examples of aggregating mmperfmon data? Is anyone doing something different? "mmpmon does not currently aggregate cluster-wide data. As of SS 4.1.x you can look at "mmperfmon query" as well, but it also primarily only provides node specific data. The tools are built to script performance data but there aren't any current scripts available for you to use within SS (except for what might be on the SS wiki page). It would likely be something you guys would need to build, that's what other clients have done." Thank you, Brian Marshall Virginia Tech - Advanced Research Computing -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Jul 12 14:23:12 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 12 Jul 2016 15:23:12 +0200 Subject: [gpfsug-discuss] Aggregating filesystem performance In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From ckrafft at de.ibm.com Wed Jul 13 09:49:00 2016 From: ckrafft at de.ibm.com (Christoph Krafft) Date: Wed, 13 Jul 2016 10:49:00 +0200 Subject: [gpfsug-discuss] GPFS / Spectrum Scale is now officially certified with SAP HANA on IBM Power Systrems Message-ID: Hi GPFS / Spectrum Scale "addicts", for all those using GPFS / Spectrum Scale "commercially" - IBM has certified it yesterday with SAP and it is NOW officially supported with HANA on IBM Power Systems. Please see the following SAP Note concerning the details. 2055470 - HANA on POWER Planning and Installation Specifics - Central Note: (See attached file: 2055470.pdf) Mit freundlichen Gr??en / Sincerely Christoph Krafft Client Technical Specialist - Power Systems, IBM Systems Certified IT Specialist @ The Open Group Phone: +49 (0) 7034 643 2171 IBM Deutschland GmbH Mobile: +49 (0) 160 97 81 86 12 Hechtsheimer Str. 2 Email: ckrafft at de.ibm.com 55131 Mainz Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 52106945.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 2055470.pdf Type: application/pdf Size: 101863 bytes Desc: not available URL: From mimarsh2 at vt.edu Wed Jul 13 14:43:43 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Wed, 13 Jul 2016 09:43:43 -0400 Subject: [gpfsug-discuss] Aggregating filesystem performance (Oesterlin, Robert) Message-ID: Robert, 1) Do you see any noticeable performance impact by running the performance monitoring? 2) Can you share the zimon configuration that you use? i.e. what metrics do you find most useful? Thank you, Brian Marshall -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 13 14:59:20 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 13 Jul 2016 13:59:20 +0000 Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Message-ID: Hi all Where can I find documentation on how to prepare RHEL 7.2 for an installation of SS 4.2 which will be a CES server? Is a "minimal" install (yes, install using the GUI, don't shoot me) sufficient or should I choose a different canned option. Thanks Richard PS I tried looking in the FAQ http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html but I couldn't find anything specific to this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 13 17:06:14 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 13 Jul 2016 16:06:14 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance Message-ID: Hi Brian I haven't seen any problems at all with the monitoring. (impacting performance). As for the Zimon metrics - let me assemble that and maybe discuss indetail off the mailing list (I've BCC'd you on this posting. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Wednesday, July 13, 2016 at 9:43 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] Aggregating filesystem performance (Oesterlin, Robert) Robert, 1) Do you see any noticeable performance impact by running the performance monitoring? 2) Can you share the zimon configuration that you use? i.e. what metrics do you find most useful? Thank you, Brian Marshall -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jul 13 17:08:32 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 13 Jul 2016 16:08:32 +0000 Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts In-Reply-To: References: Message-ID: The spectrumscale-protocols rpm (I think that was it) should include all the os dependencies you need for the various ss bits. If you were adding the ss rpms by hand, then there are packages you need to include. Unfortunately the protocols rpm adds all the protocols whether you want them or not from what I remember. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Sobey, Richard A [r.sobey at imperial.ac.uk] Sent: 13 July 2016 14:59 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Hi all Where can I find documentation on how to prepare RHEL 7.2 for an installation of SS 4.2 which will be a CES server? Is a ?minimal? install (yes, install using the GUI, don?t shoot me) sufficient or should I choose a different canned option. Thanks Richard PS I tried looking in the FAQ http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html but I couldn?t find anything specific to this. From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jul 13 17:18:18 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 13 Jul 2016 16:18:18 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance In-Reply-To: References: Message-ID: <1AC5C6E8-98A8-493B-ABB8-53237BAC23F5@Vanderbilt.Edu> Hi Bob, I am also in the process of setting up monitoring under GPFS (and it will always be GPFS) 4.2 on our test cluster right now and would also be interested in the experiences of others more experienced and knowledgeable than myself. Would you considering posting to the list? Or is there sensitive information that you don?t want to share on the list? Thanks? Kevin On Jul 13, 2016, at 11:06 AM, Oesterlin, Robert > wrote: Hi Brian I haven't seen any problems at all with the monitoring. (impacting performance). As for the Zimon metrics - let me assemble that and maybe discuss indetail off the mailing list (I've BCC'd you on this posting. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Brian Marshall > Reply-To: gpfsug main discussion list > Date: Wednesday, July 13, 2016 at 9:43 AM To: "gpfsug-discuss at spectrumscale.org" > Subject: [EXTERNAL] Re: [gpfsug-discuss] Aggregating filesystem performance (Oesterlin, Robert) Robert, 1) Do you see any noticeable performance impact by running the performance monitoring? 2) Can you share the zimon configuration that you use? i.e. what metrics do you find most useful? Thank you, Brian Marshall _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 13 17:29:08 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 13 Jul 2016 16:29:08 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance In-Reply-To: <1AC5C6E8-98A8-493B-ABB8-53237BAC23F5@Vanderbilt.Edu> References: <1AC5C6E8-98A8-493B-ABB8-53237BAC23F5@Vanderbilt.Edu> Message-ID: <90988968-C133-4965-9A91-13AE1DB8C670@nuance.com> Sure, will do. Nothing sensitive here, just a fairly complex discussion for a mailing list! We'll see - give me a day or so. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Wednesday, July 13, 2016 at 12:18 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Aggregating filesystem performance Hi Bob, I am also in the process of setting up monitoring under GPFS (and it will always be GPFS) 4.2 on our test cluster right now and would also be interested in the experiences of others more experienced and knowledgeable than myself. Would you considering posting to the list? Or is there sensitive information that you don?t want to share on the list? Thanks? Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jul 13 18:09:24 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 13 Jul 2016 17:09:24 +0000 Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts In-Reply-To: References: Message-ID: The gpfs.protocols package drags in all the openstack swift dependencies (lots of packages). I normally don't want the object support, so just install the nfs-ganesha, samba and zimon packages (plus rsync and python-ldap which I've figured out are needed). But, please beware that rhel7.2 isn't supported with v4.2.0 CES, and I've seen kernel crashes triggered by samba when ignoring that.. -jf ons. 13. jul. 2016 kl. 18.08 skrev Simon Thompson (Research Computing - IT Services) : > > The spectrumscale-protocols rpm (I think that was it) should include all > the os dependencies you need for the various ss bits. > > If you were adding the ss rpms by hand, then there are packages you need > to include. Unfortunately the protocols rpm adds all the protocols whether > you want them or not from what I remember. > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [ > gpfsug-discuss-bounces at spectrumscale.org] on behalf of Sobey, Richard A [ > r.sobey at imperial.ac.uk] > Sent: 13 July 2016 14:59 > To: 'gpfsug-discuss at spectrumscale.org' > Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts > > Hi all > > Where can I find documentation on how to prepare RHEL 7.2 for an > installation of SS 4.2 which will be a CES server? Is a ?minimal? install > (yes, install using the GUI, don?t shoot me) sufficient or should I choose > a different canned option. > > Thanks > > Richard > PS I tried looking in the FAQ > http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html > but I couldn?t find anything specific to this. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu Jul 14 08:55:33 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 14 Jul 2016 07:55:33 +0000 Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts In-Reply-To: References: Message-ID: Aha. I naively thought it would be. It?s no problem to use 7.1. Thanks for the heads up, and the responses. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jan-Frode Myklebust Sent: 13 July 2016 18:09 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts The gpfs.protocols package drags in all the openstack swift dependencies (lots of packages). I normally don't want the object support, so just install the nfs-ganesha, samba and zimon packages (plus rsync and python-ldap which I've figured out are needed). But, please beware that rhel7.2 isn't supported with v4.2.0 CES, and I've seen kernel crashes triggered by samba when ignoring that.. -jf ons. 13. jul. 2016 kl. 18.08 skrev Simon Thompson (Research Computing - IT Services) >: The spectrumscale-protocols rpm (I think that was it) should include all the os dependencies you need for the various ss bits. If you were adding the ss rpms by hand, then there are packages you need to include. Unfortunately the protocols rpm adds all the protocols whether you want them or not from what I remember. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Sobey, Richard A [r.sobey at imperial.ac.uk] Sent: 13 July 2016 14:59 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Hi all Where can I find documentation on how to prepare RHEL 7.2 for an installation of SS 4.2 which will be a CES server? Is a ?minimal? install (yes, install using the GUI, don?t shoot me) sufficient or should I choose a different canned option. Thanks Richard PS I tried looking in the FAQ http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html but I couldn?t find anything specific to this. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylorm at us.ibm.com Fri Jul 15 18:41:27 2016 From: taylorm at us.ibm.com (Michael L Taylor) Date: Fri, 15 Jul 2016 10:41:27 -0700 Subject: [gpfsug-discuss] gpfsug-discuss Sample RHEL 7.2 config Spectrum Scale install toolkit In-Reply-To: References: Message-ID: Hi Richard, The Knowledge Center should help guide you to prepare a RHEL7 node for installation with the /usr/lpp/mmfs/4.2.0.x/installer/spectrumscale install toolkit being a good way to install CES and all of its prerequisites: http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_loosein.htm For a high level quick overview of the install toolkit: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Protocols%20Quick%20Overview%20for%20IBM%20Spectrum%20Scale As mentioned, RHEL7.2 will be supported with CES with the 4.2.1 release due out shortly.... RHEL7.1 on 4.2 will work. Today's Topics: 1. Re: Aggregating filesystem performance (Oesterlin, Robert) (Brian Marshall) 2. Sample RHEL 7.2 config / anaconda scripts (Sobey, Richard A) 3. Re: Aggregating filesystem performance (Oesterlin, Robert) 4. Re: Sample RHEL 7.2 config / anaconda scripts (Simon Thompson (Research Computing - IT Services)) 5. Re: Aggregating filesystem performance (Buterbaugh, Kevin L) Message: 4 Date: Wed, 13 Jul 2016 16:08:32 +0000 From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Message-ID: Content-Type: text/plain; charset="Windows-1252" The spectrumscale-protocols rpm (I think that was it) should include all the os dependencies you need for the various ss bits. If you were adding the ss rpms by hand, then there are packages you need to include. Unfortunately the protocols rpm adds all the protocols whether you want them or not from what I remember. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Sobey, Richard A [r.sobey at imperial.ac.uk] Sent: 13 July 2016 14:59 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Hi all Where can I find documentation on how to prepare RHEL 7.2 for an installation of SS 4.2 which will be a CES server? Is a ?minimal? install (yes, install using the GUI, don?t shoot me) sufficient or should I choose a different canned option. Thanks Richard PS I tried looking in the FAQ http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html but I couldn?t find anything specific to this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Sun Jul 17 02:04:39 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Sat, 16 Jul 2016 21:04:39 -0400 Subject: [gpfsug-discuss] segment size and sub-block size Message-ID: All, When picking blockSize and segmentSize on RAID6 8+2 LUNs, I have see 2 optimal theories. 1) Make blockSize = # Data Disks * segmentSize e.g. in the RAID6 8+2 case, 8 MB blockSize = 8 * 1 MB segmentSize This makes sense to me as every GPFS block write is a full stripe write 2) Make blockSize = 32 (number sub blocks) * segmentSize; also make sure the blockSize is a multiple of #data disks * segmentSize I don't know enough about GPFS to know how subblocks interact and what tradeoffs this makes. Can someone explain (or point to a doc) about sub block mechanics and when to optimize for that? Thank you, Brian Marshall -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sun Jul 17 02:20:31 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sat, 16 Jul 2016 21:20:31 -0400 Subject: [gpfsug-discuss] segment size and sub-block size In-Reply-To: References: Message-ID: <9287130c-70ba-207c-221d-f236bad8acaf@nasa.gov> Hi Brian, We use a 128KB segment size on our DDNs and a 1MB block size and it works quite well for us (throughput in the 10's of gigabytes per second). IIRC the sub block (blockSize/32) is the smallest unit of allocatable disk space. If that's not tuned well to your workload you can end up with a lot of wasted space on the filesystem. In option #1, the smallest unit of allocatable space is 256KB. If you have millions of files that are say 8K in size you can do the math on lost space. In option #2, if you're using the same 1MB segment size from the option 1 scenario it gets even worse. Hope that helps. This might also help (https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_frags.htm). -Aaron On 7/16/16 9:04 PM, Brian Marshall wrote: > All, > > When picking blockSize and segmentSize on RAID6 8+2 LUNs, I have see 2 > optimal theories. > > > 1) Make blockSize = # Data Disks * segmentSize > e.g. in the RAID6 8+2 case, 8 MB blockSize = 8 * 1 MB segmentSize > > This makes sense to me as every GPFS block write is a full stripe write > > 2) Make blockSize = 32 (number sub blocks) * segmentSize; also make sure > the blockSize is a multiple of #data disks * segmentSize > > I don't know enough about GPFS to know how subblocks interact and what > tradeoffs this makes. > > Can someone explain (or point to a doc) about sub block mechanics and > when to optimize for that? > > Thank you, > Brian Marshall > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From mimarsh2 at vt.edu Sun Jul 17 03:56:14 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Sat, 16 Jul 2016 22:56:14 -0400 Subject: [gpfsug-discuss] SSD LUN setup Message-ID: When setting up SSDs to be used as a fast tier storage pool, are people still doing RAID6 LUNs? I think write endurance is good enough now that this is no longer a big concern (maybe a small concern). I could be wrong. I have read about other products doing RAID1 with deduplication and compression to take less than the 50% capacity hit. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Sun Jul 17 14:05:35 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Sun, 17 Jul 2016 13:05:35 +0000 Subject: [gpfsug-discuss] SSD LUN setup Message-ID: <238458C4-E9C7-4065-9716-C3B38E52445D@nuance.com> Thinly provisioned (compressed) metadata volumes is unsupported according to IBM. See the GPFS FAQ here, question 4.12: "Placing GPFS metadata on an NSD backed by a thinly provisioned volume is dangerous and unsupported." http://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Saturday, July 16, 2016 at 9:56 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] SSD LUN setup I have read about other products doing RAID1 with deduplication and compression to take less than the 50% capacity hit. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Sun Jul 17 15:21:13 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Sun, 17 Jul 2016 10:21:13 -0400 Subject: [gpfsug-discuss] SSD LUN setup In-Reply-To: <238458C4-E9C7-4065-9716-C3B38E52445D@nuance.com> References: <238458C4-E9C7-4065-9716-C3B38E52445D@nuance.com> Message-ID: That's very good advice. In my specific case, I am looking at lowlevel setup of the NSDs in a SSD storage pool with metadata stored elsewhere (on another SSD system). I am wondering if stuff like SSD pagepool size comes into play or if I just look at the segment size from the storage enclosure RAID controller. It sounds like SSDs should be used just like HDDs: group them into RAID6 LUNs. Write endurance is good enough now that longevity is not a problem and there are plenty of IOPs to do parity work. Does this sound right? Anyone doing anything else? Brian On Sun, Jul 17, 2016 at 9:05 AM, Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > Thinly provisioned (compressed) metadata volumes is unsupported according > to IBM. See the GPFS FAQ here, question 4.12: > > > > "Placing GPFS metadata on an NSD backed by a thinly provisioned volume is > dangerous and unsupported." > > > > http://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html > > > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > 507-269-0413 > > > > > > *From: * on behalf of Brian > Marshall > *Reply-To: *gpfsug main discussion list > *Date: *Saturday, July 16, 2016 at 9:56 PM > *To: *"gpfsug-discuss at spectrumscale.org" > > *Subject: *[EXTERNAL] [gpfsug-discuss] SSD LUN setup > > > > I have read about other products doing RAID1 with deduplication and > compression to take less than the 50% capacity hit. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Sun Jul 17 22:49:53 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Sun, 17 Jul 2016 22:49:53 +0100 Subject: [gpfsug-discuss] SSD LUN setup In-Reply-To: References: Message-ID: On 17/07/16 03:56, Brian Marshall wrote: > When setting up SSDs to be used as a fast tier storage pool, are people > still doing RAID6 LUNs? I think write endurance is good enough now that > this is no longer a big concern (maybe a small concern). I could be wrong. > > I have read about other products doing RAID1 with deduplication and > compression to take less than the 50% capacity hit. > There are plenty of ways in which an SSD can fail that does not involve problems with write endurance. The idea of using any disks in anything other than a test/dev GPFS file system that you simply don't care about if it goes belly up, that are not RAID or similarly protected is in my view fool hardy in the extreme. It would be like saying that HDD's can only fail due to surface defects on the platers, and then getting stung when the drive motor fails or the drive electronics stop working or better yet the drive electrics go puff literately in smoke and there is scorch marks on the PCB. Or how about a drive firmware issue that causes them to play dead under certain work loads, or drive firmware issues that just cause them to die prematurely in large numbers. These are all failure modes I have personally witnessed. My sample size for SSD's is still way to small to have seen lots of wacky failure modes, but I don't for one second believe that given time I won't see them. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Greg.Lehmann at csiro.au Mon Jul 18 00:23:09 2016 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Sun, 17 Jul 2016 23:23:09 +0000 Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale Message-ID: Hi All, Given the issues with supporting RHEL 7.2 I am wondering about the latest SLES release and support. Is anybody running actually running it on SLES 12 SP1. I've seen reference to a kernel version that is in SLES 12 SP1, but I'm not sure I trust it as the same document also says RHEL 7.2 is supported for SS 4.2. http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Cheers, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Jul 18 01:39:29 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 18 Jul 2016 00:39:29 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance Message-ID: OK, after a bit of a delay due to a hectic travel week, here is some more information on my GPFS performance collection. At the bottom, I have links to my server and client zimon config files and a link to my presentation at SSUG Argonne in June. I didn't actually present it but included it in case there was interest. I used to do a home brew system of period calls to mmpmon to collect data, sticking them into a kafka database. This was a bit cumbersome and when SS 4.2 arrived, I switched over to the built in performance sensors (zimon) to collect the data. IBM has a "as-is" bridge between Grafana and the Zimon collector that works reasonably well - they were supposed to release it but it's been delayed - I will ask about it again and post more information if I get it. My biggest struggle with the zimon configuration is the large memory requirement of the collector with large clusters (many clients, file systems, NSDs). I ended up deploying a 6 collector federation of 16gb per collector for my larger clusters -0 even then I have to limit the number of stats and amount of time I retain it. IBM is aware of the memory issue and I believe they are looking at ways to reduce it. As for what specific metrics I tend to look at: gpfs_fis_bytes_read (written) - aggregated file system read and write stats gpfs_nsdpool_bytes_read (written) - aggregated pool stats, as I have data and metadata split gpfs_fs_tot_disk_wait_rd (wr) - NSD disk wait stats These seem to make the most sense for me to get an overall sense of how things are going. I have a bunch of other more details dashboards for individual file systems and clients that help me get details. The built-in SS GUI is pretty good for small clusters, and is getting some improvements in 4.2.1 that might make me take a closer look at it again. I also look at the RPC waiters stats - no present in 4.2.0 grafana, but I hear are coming in 4.2.1 My SSUG Argonne Presentation (I didn't talk due to time constraints): http://files.gpfsug.org/presentations/2016/anl-june/SSUG_Nuance_PerfTools.pdf Zimon server config file: https://www.dropbox.com/s/gvtfhhqfpsknfnh/ZIMonSensors.cfg.server?dl=0 Zimon client config file: https://www.dropbox.com/s/k5i6rcnaco4vxu6/ZIMonSensors.cfg.client?dl=0 Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Wednesday, July 13, 2016 at 8:43 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] Aggregating filesystem performance (Oesterlin, Robert) Robert, 1) Do you see any noticeable performance impact by running the performance monitoring? 2) Can you share the zimon configuration that you use? i.e. what metrics do you find most useful? Thank you, Brian Marshall -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Mon Jul 18 15:07:51 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 18 Jul 2016 10:07:51 -0400 Subject: [gpfsug-discuss] SSD LUN setup In-Reply-To: References: Message-ID: @Jonathan, I completely agree on the SSD failure. I wasn't suggesting that better write endurance made them impervious to failures, just that I read a few articles from ~3-5 years back saying that RAID5 or RAID6 would destroy your SSDs and have a really high probability of all SSDs failing at the same time as the # of writes were equal on all SSDs in the RAID group. I think that's no longer the case and RAID6 on SSDs is fine. I was looking for examples of what others have done: RAID6, using GPFS data replicas, or some other thing I don't know about that better takes advantage of SSD architecture. Background - I am a storage noob Also is the @Jonathan proper list etiquette? Thanks everyone to great advice I've been getting. Thank you, Brian On Sun, Jul 17, 2016 at 5:49 PM, Jonathan Buzzard wrote: > On 17/07/16 03:56, Brian Marshall wrote: > >> When setting up SSDs to be used as a fast tier storage pool, are people >> still doing RAID6 LUNs? I think write endurance is good enough now that >> this is no longer a big concern (maybe a small concern). I could be >> wrong. >> >> I have read about other products doing RAID1 with deduplication and >> compression to take less than the 50% capacity hit. >> >> > There are plenty of ways in which an SSD can fail that does not involve > problems with write endurance. The idea of using any disks in anything > other than a test/dev GPFS file system that you simply don't care about if > it goes belly up, that are not RAID or similarly protected is in my view > fool hardy in the extreme. > > It would be like saying that HDD's can only fail due to surface defects on > the platers, and then getting stung when the drive motor fails or the drive > electronics stop working or better yet the drive electrics go puff > literately in smoke and there is scorch marks on the PCB. Or how about a > drive firmware issue that causes them to play dead under certain work > loads, or drive firmware issues that just cause them to die prematurely in > large numbers. > > These are all failure modes I have personally witnessed. My sample size > for SSD's is still way to small to have seen lots of wacky failure modes, > but I don't for one second believe that given time I won't see them. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Mon Jul 18 18:34:38 2016 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Mon, 18 Jul 2016 19:34:38 +0200 Subject: [gpfsug-discuss] SSD LUN setup In-Reply-To: References: Message-ID: Hi Brian, write endurance is one thing you need to run small IOs on on RAID5/RAID6. However, while SSDs are much faster than HDDs when it comes to reads, they are just faster when it comes to writes. The RMW penalty on small writes to RAID5 / RAID6 will incur a higher actual data write rate at your SSD devices than you see going from your OS / file system to the storage. How much higher depends on the actual IO sizes to the RAID device related to your full stripe widths. Mind that the write caches on all levels will help here getting the the IOs larger than what the application does. Beyond a certain point, however, if you go to smaller and smaller IOs (in relation to your stripe widths) you might want to look for some other redundancy code than RAID5/RAID6 or related parity-using mechanisms even if you pay the capacity price of simple data replication (RAID1, or 3w in GNR). That depends of course, but is worth a consideration. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Brian Marshall To: gpfsug main discussion list Date: 07/18/2016 04:08 PM Subject: Re: [gpfsug-discuss] SSD LUN setup Sent by: gpfsug-discuss-bounces at spectrumscale.org @Jonathan, I completely agree on the SSD failure. I wasn't suggesting that better write endurance made them impervious to failures, just that I read a few articles from ~3-5 years back saying that RAID5 or RAID6 would destroy your SSDs and have a really high probability of all SSDs failing at the same time as the # of writes were equal on all SSDs in the RAID group. I think that's no longer the case and RAID6 on SSDs is fine. I was looking for examples of what others have done: RAID6, using GPFS data replicas, or some other thing I don't know about that better takes advantage of SSD architecture. Background - I am a storage noob Also is the @Jonathan proper list etiquette? Thanks everyone to great advice I've been getting. Thank you, Brian On Sun, Jul 17, 2016 at 5:49 PM, Jonathan Buzzard wrote: On 17/07/16 03:56, Brian Marshall wrote: When setting up SSDs to be used as a fast tier storage pool, are people still doing RAID6 LUNs? I think write endurance is good enough now that this is no longer a big concern (maybe a small concern). I could be wrong. I have read about other products doing RAID1 with deduplication and compression to take less than the 50% capacity hit. There are plenty of ways in which an SSD can fail that does not involve problems with write endurance. The idea of using any disks in anything other than a test/dev GPFS file system that you simply don't care about if it goes belly up, that are not RAID or similarly protected is in my view fool hardy in the extreme. It would be like saying that HDD's can only fail due to surface defects on the platers, and then getting stung when the drive motor fails or the drive electronics stop working or better yet the drive electrics go puff literately in smoke and there is scorch marks on the PCB. Or how about a drive firmware issue that causes them to play dead under certain work loads, or drive firmware issues that just cause them to die prematurely in large numbers. These are all failure modes I have personally witnessed. My sample size for SSD's is still way to small to have seen lots of wacky failure modes, but I don't for one second believe that given time I won't see them. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Tue Jul 19 08:59:43 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 19 Jul 2016 07:59:43 +0000 Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale Message-ID: I thought it was supported, but that CES (Integrated protocols support) is only supported up to 7.1 Simon From: > on behalf of "Greg.Lehmann at csiro.au" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 18 July 2016 at 00:23 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale Hi All, Given the issues with supporting RHEL 7.2 I am wondering about the latest SLES release and support. Is anybody running actually running it on SLES 12 SP1. I?ve seen reference to a kernel version that is in SLES 12 SP1, but I?m not sure I trust it as the same document also says RHEL 7.2 is supported for SS 4.2. http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Cheers, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Wed Jul 20 01:17:23 2016 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 20 Jul 2016 00:17:23 +0000 Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale In-Reply-To: References: Message-ID: You are right. An IBMer cleared it up for me. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, 19 July 2016 6:00 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale I thought it was supported, but that CES (Integrated protocols support) is only supported up to 7.1 Simon From: > on behalf of "Greg.Lehmann at csiro.au" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 18 July 2016 at 00:23 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale Hi All, Given the issues with supporting RHEL 7.2 I am wondering about the latest SLES release and support. Is anybody running actually running it on SLES 12 SP1. I've seen reference to a kernel version that is in SLES 12 SP1, but I'm not sure I trust it as the same document also says RHEL 7.2 is supported for SS 4.2. http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Cheers, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Jul 20 13:21:19 2016 From: bevans at pixitmedia.com (Barry Evans) Date: Wed, 20 Jul 2016 13:21:19 +0100 Subject: [gpfsug-discuss] New AFM Toys Message-ID: Just noticed this in the 4.2.0-4 release notes: * Fix the readdir performance issue of independent writer mode filesets in the AFM environment. Introduce a new configuration option afmDIO at the fileset level to replicate data from cache to home using direct IO. Before I start weeping tears of joy, does anyone have any further info on this (the issue and the new parameter?) Does this apply to both NFS and GPFS transpots? It looks very promising! -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 20 15:42:09 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 20 Jul 2016 14:42:09 +0000 Subject: [gpfsug-discuss] Migrating to CES from CTDB Message-ID: Hi all, Does anyone have any experience migrating from CTDB and GPFS 3.5 to CES and GPFS 4.2? We've got a plan of how to do it, but the issue is doing it without causing any downtime to the front end. We're using "secrets and keytab" for auth in smb.conf. So the only way I think we can do it is build out the 4.2 servers and somehow integrate them into the existing cluster (front end cluster) - or more accurately - keep the same FQDN of the cluster and just change DNS to point the FDQN to the new servers, and remove it from the existing ones. The big question is: will this work in theory? The downtime option involves shutting down the CTDB cluster, deleting the AD object, renaming the new cluster and starting CES SMB to allow it to join AD with the same name. This takes about 15 minutes while allowing for AD replication and the TTL on the DNS. This then has to be repeated when the original CTDB nodes have been reinstalled. I feel like I'm rambling but I just can't find any guides on migrating protocols from CTDB to CES, just migrations of GPFS itself. Plus my knowledge of samba isn't all that :) Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Jul 20 16:15:39 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 20 Jul 2016 16:15:39 +0100 Subject: [gpfsug-discuss] Migrating to CES from CTDB In-Reply-To: References: Message-ID: <1469027739.26989.8.camel@buzzard.phy.strath.ac.uk> On Wed, 2016-07-20 at 14:42 +0000, Sobey, Richard A wrote: [SNIP] > > The downtime option involves shutting down the CTDB cluster, deleting > the AD object, renaming the new cluster and starting CES SMB to allow > it to join AD with the same name. This takes about 15 minutes while > allowing for AD replication and the TTL on the DNS. This then has to > be repeated when the original CTDB nodes have been reinstalled. > Can you not reduce the TTL on the DNS to as low as possible prior to the changeover to reduce the required downtime for the switch over? You are also aware that you can force the AD replication so no need to wait for that, other than the replication time, which should be pretty quick? I also believe that it is not necessary to delete the AD object. Just leave it as is, and it will be overwritten when you join the new CES cluster. Saves you a step. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From r.sobey at imperial.ac.uk Wed Jul 20 16:23:00 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 20 Jul 2016 15:23:00 +0000 Subject: [gpfsug-discuss] Migrating to CES from CTDB In-Reply-To: <1469027739.26989.8.camel@buzzard.phy.strath.ac.uk> References: <1469027739.26989.8.camel@buzzard.phy.strath.ac.uk> Message-ID: I was thinking of that. Current TTL is 900s, we can probably lower it on a temporary basis to facilitate the change. I wasn't aware about the AD object, no... I presume the existing object will simply be updated when the new cluster joins, which in turn will trigger a replication of it anyway? Thanks Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathan Buzzard Sent: 20 July 2016 16:16 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Migrating to CES from CTDB On Wed, 2016-07-20 at 14:42 +0000, Sobey, Richard A wrote: [SNIP] > > The downtime option involves shutting down the CTDB cluster, deleting > the AD object, renaming the new cluster and starting CES SMB to allow > it to join AD with the same name. This takes about 15 minutes while > allowing for AD replication and the TTL on the DNS. This then has to > be repeated when the original CTDB nodes have been reinstalled. > Can you not reduce the TTL on the DNS to as low as possible prior to the changeover to reduce the required downtime for the switch over? You are also aware that you can force the AD replication so no need to wait for that, other than the replication time, which should be pretty quick? I also believe that it is not necessary to delete the AD object. Just leave it as is, and it will be overwritten when you join the new CES cluster. Saves you a step. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Mark.Bush at siriuscom.com Wed Jul 20 18:23:32 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 17:23:32 +0000 Subject: [gpfsug-discuss] More fun with Policies Message-ID: Can anyone explain why I can?t apply a policy in the GUI any longer? Here?s my screenshot of what I?m presented with now after I created a policy from the CLI [cid:image001.png at 01D1E281.87334DC0] See how the editor is missing as well as the typical buttons on the bottom of the Leftmost column? Also how does one delete a policy? There?s no mmdelpolicy Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 30212 bytes Desc: image001.png URL: From jamiedavis at us.ibm.com Wed Jul 20 19:17:09 2016 From: jamiedavis at us.ibm.com (James Davis) Date: Wed, 20 Jul 2016 18:17:09 +0000 Subject: [gpfsug-discuss] More fun with Policies In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.image001.png at 01D1E281.87334DC0.png Type: image/png Size: 30212 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Wed Jul 20 19:24:02 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 18:24:02 +0000 Subject: [gpfsug-discuss] More fun with Policies In-Reply-To: References: Message-ID: <09C9ADED-936E-4779-83D7-4D9A008E4302@siriuscom.com> Thanks James. I did just that (running 4.2.0.3). mmchpolicy fs1 DEFAULT. It didn?t fix the gui however I wonder if it?s a bug in the gui code or something like that. From: on behalf of James Davis Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 1:17 PM To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] More fun with Policies Hi Mark, I don't have an answer about the GUI change, but I believe as of 4.1 you can "delete" a policy by using mmchpolicy like this: #14:15:36# c42an3:~ # mmchpolicy c42_fs2_dmapi DEFAULT GPFS: 6027-2809 Validated policy 'DEFAULT': GPFS: 6027-799 Policy `DEFAULT' installed and broadcast to all nodes. #14:16:06# c42an3:~ # mmlspolicy c42_fs2_dmapi -L /* DEFAULT */ /* Store data in the first data pool or system pool */ If your release doesn't support that, try a simple policy like RULE 'default' SET POOL 'system' or RULE 'default' SET POOL '' Cheers, Jamie Jamie Davis GPFS Functional Verification Test (FVT) jamiedavis at us.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] More fun with Policies Date: Wed, Jul 20, 2016 1:24 PM Can anyone explain why I can?t apply a policy in the GUI any longer? Here?s my screenshot of what I?m presented with now after I created a policy from the CLI [cid:image001.png at 01D1E289.FAA9E450] See how the editor is missing as well as the typical buttons on the bottom of the Leftmost column? Also how does one delete a policy? There?s no mmdelpolicy Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 30213 bytes Desc: image001.png URL: From Mark.Bush at siriuscom.com Wed Jul 20 19:45:41 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 18:45:41 +0000 Subject: [gpfsug-discuss] More fun with Policies In-Reply-To: <09C9ADED-936E-4779-83D7-4D9A008E4302@siriuscom.com> References: <09C9ADED-936E-4779-83D7-4D9A008E4302@siriuscom.com> Message-ID: I killed my browser cache and all is well now. From: on behalf of Mark Bush Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 1:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] More fun with Policies Thanks James. I did just that (running 4.2.0.3). mmchpolicy fs1 DEFAULT. It didn?t fix the gui however I wonder if it?s a bug in the gui code or something like that. From: on behalf of James Davis Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 1:17 PM To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] More fun with Policies Hi Mark, I don't have an answer about the GUI change, but I believe as of 4.1 you can "delete" a policy by using mmchpolicy like this: #14:15:36# c42an3:~ # mmchpolicy c42_fs2_dmapi DEFAULT GPFS: 6027-2809 Validated policy 'DEFAULT': GPFS: 6027-799 Policy `DEFAULT' installed and broadcast to all nodes. #14:16:06# c42an3:~ # mmlspolicy c42_fs2_dmapi -L /* DEFAULT */ /* Store data in the first data pool or system pool */ If your release doesn't support that, try a simple policy like RULE 'default' SET POOL 'system' or RULE 'default' SET POOL '' Cheers, Jamie Jamie Davis GPFS Functional Verification Test (FVT) jamiedavis at us.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] More fun with Policies Date: Wed, Jul 20, 2016 1:24 PM Can anyone explain why I can?t apply a policy in the GUI any longer? Here?s my screenshot of what I?m presented with now after I created a policy from the CLI [cid:image001.png at 01D1E28D.0011FCE0] See how the editor is missing as well as the typical buttons on the bottom of the Leftmost column? Also how does one delete a policy? There?s no mmdelpolicy Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 30214 bytes Desc: image001.png URL: From Mark.Bush at siriuscom.com Wed Jul 20 21:47:13 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 20:47:13 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario Message-ID: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenh at us.ibm.com Wed Jul 20 22:15:49 2016 From: kenh at us.ibm.com (Ken Hill) Date: Wed, 20 Jul 2016 17:15:49 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone: 1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Wed Jul 20 22:27:03 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 20 Jul 2016 21:27:03 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Hi Mark, We do this. We have sync replication between two sites with extended san and Ethernet fabric between them. We then use copies=2 for both metadata and data (most filesets). We then also have a vm quorum node which runs on VMware in a fault tolerant cluster. We tested split braining the sites before we went into production. It does work, but we did find some interesting failure modes doing the testing, so do that and push it hard. We multicluster our ces nodes (yes technically I know isn't supported), and again have a quorum vm which has dc affinity to the storage cluster one to ensure ces fails to the same DC. You may also want to look at readReplicaPolicy=local and Infiniband fabric numbers, and probably subnets to ensure your clients prefer the local site for read. Write of course needs enough bandwidth between sites to keep it fast. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mark.Bush at siriuscom.com [Mark.Bush at siriuscom.com] Sent: 20 July 2016 21:47 To: gpfsug main discussion list Subject: [gpfsug-discuss] NDS in Two Site scenario For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions From makaplan at us.ibm.com Wed Jul 20 22:52:25 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 20 Jul 2016 17:52:25 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Careful! You need to plan and test, test and plan both failure scenarios and performance under high network loads. I don't believe GPFS was designed with the idea of splitting clusters over multiple sites. If your inter-site network runs fast enough, and you can administer it well enough -- perhaps it will work well enough... Hint: Think about the what the words "cluster" and "site" mean. GPFS does have the AFM feature, which was designed for multi-site deployments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Jul 20 23:34:53 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 20 Jul 2016 23:34:53 +0100 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: <3e1dc902-ca52-4ab1-2ca3-e51ba3f18b32@buzzard.me.uk> On 20/07/16 22:15, Ken Hill wrote: [SNIP] > You can further isolate failure by increasing quorum (odd numbers). > > The way quorum works is: The majority of the quorum nodes need to be up > to survive an outage. > > - With 3 quorum nodes you can have 1 quorum node failures and continue > filesystem operations. > - With 5 quorum nodes you can have 2 quorum node failures and continue > filesystem operations. > - With 7 quorum nodes you can have 3 quorum node failures and continue > filesystem operations. > - etc > The alternative is a tiebreaker disk to prevent split brain clusters. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Mark.Bush at siriuscom.com Thu Jul 21 00:33:06 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 23:33:06 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kcfor more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image001.png at 01D1E2B5.280C2EA0] [cid:image002.png at 01D1E2B5.280C2EA0] [cid:image003.png at 01D1E2B5.280C2EA0] [cid:image004.png at 01D1E2B5.280C2EA0] [cid:image005.png at 01D1E2B5.280C2EA0] [cid:image006.png at 01D1E2B5.280C2EA0] [cid:image007.png at 01D1E2B5.280C2EA0] [cid:image008.png at 01D1E2B5.280C2EA0] [cid:image009.png at 01D1E2B5.280C2EA0] [cid:image010.png at 01D1E2B5.280C2EA0] [cid:image011.png at 01D1E2B5.280C2EA0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1621 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1597 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1072 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 979 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1564 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1313 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 1168 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 1426 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1369 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 1244 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 4454 bytes Desc: image011.png URL: From Mark.Bush at siriuscom.com Thu Jul 21 00:34:29 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 23:34:29 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Marc, what you are saying is anything outside a particular data center shouldn?t be part of a cluster? I?m not sure marketing is in line with this then. From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:52 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Careful! You need to plan and test, test and plan both failure scenarios and performance under high network loads. I don't believe GPFS was designed with the idea of splitting clusters over multiple sites. If your inter-site network runs fast enough, and you can administer it well enough -- perhaps it will work well enough... Hint: Think about the what the words "cluster" and "site" mean. GPFS does have the AFM feature, which was designed for multi-site deployments. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenh at us.ibm.com Thu Jul 21 01:02:01 2016 From: kenh at us.ibm.com (Ken Hill) Date: Wed, 20 Jul 2016 20:02:01 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone: 1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1621 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1597 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1072 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 979 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1564 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1426 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1369 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1244 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4454 bytes Desc: not available URL: From YARD at il.ibm.com Thu Jul 21 05:48:09 2016 From: YARD at il.ibm.com (Yaron Daniel) Date: Thu, 21 Jul 2016 07:48:09 +0300 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: HI U must remember the following: Network vlan should be the same between 2 Main Sites - since the CES IP failover will not work... U can define : Site1 - 2 x NSD servers + Quorum Site2 - 2 x NSD servers + Quorum GPFS FS replication define with failure groups. (Latency must be very low in order to have write performance). Site3 - 1 x Quorum + Local disk as Tie Breaker Disk. (Desc Only) Hope this help. Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Ken Hill" To: gpfsug main discussion list Date: 07/21/2016 03:02 AM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1621 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1597 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1072 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 979 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1564 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1426 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1369 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1244 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4454 bytes Desc: not available URL: From ashish.thandavan at cs.ox.ac.uk Thu Jul 21 11:26:02 2016 From: ashish.thandavan at cs.ox.ac.uk (Ashish Thandavan) Date: Thu, 21 Jul 2016 11:26:02 +0100 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience Message-ID: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> Dear all, Please could anyone be able to point me at specifications required for the GPFS heartbeat network? Are there any figures for latency, jitter, etc that one should be aware of? I also have a related question about resilience. Our three GPFS NSD servers utilize a single network port on each server and communicate heartbeat traffic over a private VLAN. We are looking at improving the resilience of this setup by adding an additional network link on each server (going to a different member of a pair of stacked switches than the existing one) and running the heartbeat network over bonded interfaces on the three servers. Are there any recommendations as to which network bonding type to use? Based on the name alone, Mode 1 (active-backup) appears to be the ideal choice, and I believe the switches do not need any special configuration. However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might be the way to go; this aggregates the two ports and does require the relevant switch ports to be configured to support this. Is there a recommended bonding mode? If anyone here currently uses bonded interfaces for their GPFS heartbeat traffic, may I ask what type of bond have you configured? Have you had any problems with the setup? And more importantly, has it been of use in keeping the cluster up and running in the scenario of one network link going down? Thank you, Regards, Ash -- ------------------------- Ashish Thandavan UNIX Support Computing Officer Department of Computer Science University of Oxford Wolfson Building Parks Road Oxford OX1 3QD Phone: 01865 610733 Email: ashish.thandavan at cs.ox.ac.uk From Mark.Bush at siriuscom.com Thu Jul 21 13:45:12 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 21 Jul 2016 12:45:12 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. Site1 NSD Node1 ---NSD1 ---Physical LUN1 from SAN1 NSD Node2 Site2 NSD Node3 ---NSD2 ?Physical LUN2 from SAN2 NSD Node4 Or Site1 NSD Node1 ----NSD1 ?Physical LUN1 from SAN1 ----NSD2 ?Physical LUN2 from SAN2 NSD Node2 Site 2 NSD Node3 ---NSD2 ? Physical LUN2 from SAN2 ---NSD1 --Physical LUN1 from SAN1 NSD Node4 Site 3 Node5 Quorum From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 7:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image001.png at 01D1E323.CF61AAE0] [cid:image002.png at 01D1E323.CF61AAE0] [cid:image003.png at 01D1E323.CF61AAE0] [cid:image004.png at 01D1E323.CF61AAE0] [cid:image005.png at 01D1E323.CF61AAE0] [cid:image006.png at 01D1E323.CF61AAE0] [cid:image007.png at 01D1E323.CF61AAE0] [cid:image008.png at 01D1E323.CF61AAE0] [cid:image009.png at 01D1E323.CF61AAE0] [cid:image010.png at 01D1E323.CF61AAE0] [cid:image011.png at 01D1E323.CF61AAE0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kcfor more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image012.png at 01D1E323.CF61AAE0] [cid:image013.png at 01D1E323.CF61AAE0] [cid:image014.png at 01D1E323.CF61AAE0] [cid:image015.png at 01D1E323.CF61AAE0] [cid:image016.png at 01D1E323.CF61AAE0] [cid:image017.png at 01D1E323.CF61AAE0] [cid:image018.png at 01D1E323.CF61AAE0] [cid:image019.png at 01D1E323.CF61AAE0] [cid:image020.png at 01D1E323.CF61AAE0] [cid:image021.png at 01D1E323.CF61AAE0] [cid:image022.png at 01D1E323.CF61AAE0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1621 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1597 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1072 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 979 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1564 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1313 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 1168 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 1426 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1369 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 1244 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 4454 bytes Desc: image011.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image012.png Type: image/png Size: 1622 bytes Desc: image012.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image013.png Type: image/png Size: 1598 bytes Desc: image013.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image014.png Type: image/png Size: 1073 bytes Desc: image014.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image015.png Type: image/png Size: 980 bytes Desc: image015.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image016.png Type: image/png Size: 1565 bytes Desc: image016.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image017.png Type: image/png Size: 1314 bytes Desc: image017.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image018.png Type: image/png Size: 1169 bytes Desc: image018.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image019.png Type: image/png Size: 1427 bytes Desc: image019.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image020.png Type: image/png Size: 1370 bytes Desc: image020.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image021.png Type: image/png Size: 1245 bytes Desc: image021.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image022.png Type: image/png Size: 4455 bytes Desc: image022.png URL: From jonathan at buzzard.me.uk Thu Jul 21 14:01:06 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 21 Jul 2016 14:01:06 +0100 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: <1469106066.26989.33.camel@buzzard.phy.strath.ac.uk> On Thu, 2016-07-21 at 12:45 +0000, Mark.Bush at siriuscom.com wrote: > This is where my confusion sits. So if I have two sites, and two NDS > Nodes per site with 1 NSD (to keep it simple), do I just present the > physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to > Site2 NSD Nodes? Unless you are going to use a tiebreaker disk you need an odd number of NSD nodes. If you don't you risk a split brain cluster and well god only knows what will happen to your file system in such a scenario. > Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and > the same at Site2? (Assuming SAN and not direct attached in this > case). I know I?m being persistent but this for some reason confuses > me. That's one way of doing it assuming that you have extended your SAN across both sites. You present all LUN's to all NSD nodes regardless of which site they are at. With this method you can use a tiebreaker disk. Alternatively you present the LUN's at site one to the NSD servers at site one and all the LUN's at site two to the NSD servers at site two, and set failure and replication groups up appropriately. However in this scenario it is critical to have an odd number of NSD servers because you can only use tiebreaker disks where every NSD node can see the physical disk aka it's SAN attached (either FC or iSCSI) to all NSD nodes. That said as others have pointed out, beyond a metropolitan area network I can't see multi site GPFS working. You could I guess punt iSCSI over the internet but performance is going to be awful, and iSCSI and GPFS just don't mix in my experience. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From S.J.Thompson at bham.ac.uk Thu Jul 21 14:02:03 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 21 Jul 2016 13:02:03 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: It depends. What are you protecting against? Either will work depending on your acceptable failure modes. I'm assuming here that you are using copies=2 to replicate the data, and that the NSD devices have different failure groups per site. In the second example, if you were to lose the NSD servers in Site 1, but not the SAN, you would continue to have 2 copies of data written as the NSD servers in Site 2 could write to the SAN in Site 1. In the first example you would need to rest ripe the file-system when brining the Site 1 back online to ensure data is replicated.\ Simon From: > on behalf of "Mark.Bush at siriuscom.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 21 July 2016 at 13:45 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. Site1 NSD Node1 ---NSD1 ---Physical LUN1 from SAN1 NSD Node2 Site2 NSD Node3 ---NSD2 ?Physical LUN2 from SAN2 NSD Node4 Or Site1 NSD Node1 ----NSD1 ?Physical LUN1 from SAN1 ----NSD2 ?Physical LUN2 from SAN2 NSD Node2 Site 2 NSD Node3 ---NSD2 ? Physical LUN2 from SAN2 ---NSD1 --Physical LUN1 from SAN1 NSD Node4 Site 3 Node5 Quorum From: > on behalf of Ken Hill > Reply-To: gpfsug main discussion list > Date: Wednesday, July 20, 2016 at 7:02 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image001.png at 01D1E323.CF61AAE0] [cid:image002.png at 01D1E323.CF61AAE0] [cid:image003.png at 01D1E323.CF61AAE0] [cid:image004.png at 01D1E323.CF61AAE0] [cid:image005.png at 01D1E323.CF61AAE0] [cid:image006.png at 01D1E323.CF61AAE0] [cid:image007.png at 01D1E323.CF61AAE0] [cid:image008.png at 01D1E323.CF61AAE0] [cid:image009.png at 01D1E323.CF61AAE0] [cid:image010.png at 01D1E323.CF61AAE0] [cid:image011.png at 01D1E323.CF61AAE0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" > To: gpfsug main discussion list > Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So in this scenario Ken, can server3 see any disks in site1? From: > on behalf of Ken Hill > Reply-To: gpfsug main discussion list > Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kcfor more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image012.png at 01D1E323.CF61AAE0] [cid:image013.png at 01D1E323.CF61AAE0] [cid:image014.png at 01D1E323.CF61AAE0] [cid:image015.png at 01D1E323.CF61AAE0] [cid:image016.png at 01D1E323.CF61AAE0] [cid:image017.png at 01D1E323.CF61AAE0] [cid:image018.png at 01D1E323.CF61AAE0] [cid:image019.png at 01D1E323.CF61AAE0] [cid:image020.png at 01D1E323.CF61AAE0] [cid:image021.png at 01D1E323.CF61AAE0] [cid:image022.png at 01D1E323.CF61AAE0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" > To: gpfsug main discussion list > Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1621 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1597 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1072 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 979 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1564 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1313 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 1168 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 1426 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1369 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 1244 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 4454 bytes Desc: image011.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image012.png Type: image/png Size: 1622 bytes Desc: image012.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image013.png Type: image/png Size: 1598 bytes Desc: image013.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image014.png Type: image/png Size: 1073 bytes Desc: image014.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image015.png Type: image/png Size: 980 bytes Desc: image015.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image016.png Type: image/png Size: 1565 bytes Desc: image016.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image017.png Type: image/png Size: 1314 bytes Desc: image017.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image018.png Type: image/png Size: 1169 bytes Desc: image018.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image019.png Type: image/png Size: 1427 bytes Desc: image019.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image020.png Type: image/png Size: 1370 bytes Desc: image020.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image021.png Type: image/png Size: 1245 bytes Desc: image021.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image022.png Type: image/png Size: 4455 bytes Desc: image022.png URL: From viccornell at gmail.com Thu Jul 21 14:02:02 2016 From: viccornell at gmail.com (Vic Cornell) Date: Thu, 21 Jul 2016 14:02:02 +0100 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: The annoying answer is "it depends?. I ran a system with all of the NSDs being visible to all of the NSDs on both sites and that worked well. However there are lots of questions to answer: Where are the clients going to live? Will you have clients in both sites or just one? Is it dual site working or just DR? Where will the majority of the writes happen? Would you rather that traffic went over the SAN or the IP link? Do you have a SAN link between the 2 sites? Which is faster, the SAN link between sites or the IP link between the sites? Are they the same link? Are they both redundant, which is the most stable? The answers to these questions would drive the design of the gpfs filesystem. For example if there are clients on only on site A , you might then make the NSD servers on site A the primary NSD servers for all of the NSDs on site A and site B - then you send the replica blocks over the SAN. You also could make a matrix of the failure scenarios you want to protect against, the consequences of the failure and the likelihood of failure etc. That will also inform the design. Does that help? Vic > On 21 Jul 2016, at 1:45 pm, Mark.Bush at siriuscom.com wrote: > > This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. > > Site1 > NSD Node1 > ---NSD1 ---Physical LUN1 from SAN1 > NSD Node2 > > > Site2 > NSD Node3 > ---NSD2 ?Physical LUN2 from SAN2 > NSD Node4 > > > Or > > > Site1 > NSD Node1 > ----NSD1 ?Physical LUN1 from SAN1 > ----NSD2 ?Physical LUN2 from SAN2 > NSD Node2 > > Site 2 > NSD Node3 > ---NSD2 ? Physical LUN2 from SAN2 > ---NSD1 --Physical LUN1 from SAN1 > NSD Node4 > > > Site 3 > Node5 Quorum > > > > From: > on behalf of Ken Hill > > Reply-To: gpfsug main discussion list > > Date: Wednesday, July 20, 2016 at 7:02 PM > To: gpfsug main discussion list > > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario > > Yes - it is a cluster. > > The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). > > Regards, > > Ken Hill > Technical Sales Specialist | Software Defined Solution Sales > IBM Systems > Phone:1-540-207-7270 > E-mail: kenh at us.ibm.com > > > 2300 Dulles Station Blvd > Herndon, VA 20171-6133 > United States > > > > > > > From: "Mark.Bush at siriuscom.com " > > To: gpfsug main discussion list > > Date: 07/20/2016 07:33 PM > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > So in this scenario Ken, can server3 see any disks in site1? > > From: > on behalf of Ken Hill > > Reply-To: gpfsug main discussion list > > Date: Wednesday, July 20, 2016 at 4:15 PM > To: gpfsug main discussion list > > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario > > > Site1 Site2 > Server1 (quorum 1) Server3 (quorum 2) > Server2 Server4 > > > > > SiteX > Server5 (quorum 3) > > > > > You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. > > You can further isolate failure by increasing quorum (odd numbers). > > The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. > > - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. > - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. > - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. > - etc > > Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. > > Ken Hill > Technical Sales Specialist | Software Defined Solution Sales > IBM Systems > Phone:1-540-207-7270 > E-mail: kenh at us.ibm.com > > > 2300 Dulles Station Blvd > Herndon, VA 20171-6133 > United States > > > > > > > From: "Mark.Bush at siriuscom.com " > > To: gpfsug main discussion list > > Date: 07/20/2016 04:47 PM > Subject: [gpfsug-discuss] NDS in Two Site scenario > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. > > > > Mark R. Bush| Solutions Architect > Mobile: 210.237.8415 | mark.bush at siriuscom.com > Sirius Computer Solutions | www.siriuscom.com > 10100 Reunion Place, Suite 500, San Antonio, TX 78216 > > This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. > > Sirius Computer Solutions _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 21 14:12:58 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 21 Jul 2016 13:12:58 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: <10D22907-E641-41AF-A31A-17755288E005@siriuscom.com> Thanks Vic&Simon, I?m totally cool with ?it depends? the solution guidance is to achieve a Highly Available FS. And there is Dark Fibre between the two locations. FileNet is the application and they want two things. Ability to write in both locations (maybe close to at the same time not necessarily the same files though) and protect against any site failure. So in my mind my Scenario 1 would work as long as I had copies=2 and restripe are acceptable. Is my Scenario 2 I would still have to restripe if the SAN in site 1 went down. I?m looking for the simplest approach that provides the greatest availability. From: on behalf of "Simon Thompson (Research Computing - IT Services)" Reply-To: gpfsug main discussion list Date: Thursday, July 21, 2016 at 8:02 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario It depends. What are you protecting against? Either will work depending on your acceptable failure modes. I'm assuming here that you are using copies=2 to replicate the data, and that the NSD devices have different failure groups per site. In the second example, if you were to lose the NSD servers in Site 1, but not the SAN, you would continue to have 2 copies of data written as the NSD servers in Site 2 could write to the SAN in Site 1. In the first example you would need to rest ripe the file-system when brining the Site 1 back online to ensure data is replicated.\ Simon From: > on behalf of "Mark.Bush at siriuscom.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 21 July 2016 at 13:45 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. Site1 NSD Node1 ---NSD1 ---Physical LUN1 from SAN1 NSD Node2 Site2 NSD Node3 ---NSD2 ?Physical LUN2 from SAN2 NSD Node4 Or Site1 NSD Node1 ----NSD1 ?Physical LUN1 from SAN1 ----NSD2 ?Physical LUN2 from SAN2 NSD Node2 Site 2 NSD Node3 ---NSD2 ? Physical LUN2 from SAN2 ---NSD1 --Physical LUN1 from SAN1 NSD Node4 Site 3 Node5 Quorum From: > on behalf of Ken Hill > Reply-To: gpfsug main discussion list > Date: Wednesday, July 20, 2016 at 7:02 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image001.png at 01D1E327.B037C650] [cid:image002.png at 01D1E327.B037C650] [cid:image003.png at 01D1E327.B037C650] [cid:image004.png at 01D1E327.B037C650] [cid:image005.png at 01D1E327.B037C650] [cid:image006.png at 01D1E327.B037C650] [cid:image007.png at 01D1E327.B037C650] [cid:image008.png at 01D1E327.B037C650] [cid:image009.png at 01D1E327.B037C650] [cid:image010.png at 01D1E327.B037C650] [cid:image011.png at 01D1E327.B037C650] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" > To: gpfsug main discussion list > Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So in this scenario Ken, can server3 see any disks in site1? From: > on behalf of Ken Hill > Reply-To: gpfsug main discussion list > Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kcfor more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image012.png at 01D1E327.B037C650] [cid:image013.png at 01D1E327.B037C650] [cid:image014.png at 01D1E327.B037C650] [cid:image015.png at 01D1E327.B037C650] [cid:image016.png at 01D1E327.B037C650] [cid:image017.png at 01D1E327.B037C650] [cid:image018.png at 01D1E327.B037C650] [cid:image019.png at 01D1E327.B037C650] [cid:image020.png at 01D1E327.B037C650] [cid:image021.png at 01D1E327.B037C650] [cid:image022.png at 01D1E327.B037C650] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" > To: gpfsug main discussion list > Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1622 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1598 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1073 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 980 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1565 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1314 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 1169 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 1427 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1370 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 1245 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 4455 bytes Desc: image011.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image012.png Type: image/png Size: 1623 bytes Desc: image012.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image013.png Type: image/png Size: 1599 bytes Desc: image013.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image014.png Type: image/png Size: 1074 bytes Desc: image014.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image015.png Type: image/png Size: 981 bytes Desc: image015.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image016.png Type: image/png Size: 1566 bytes Desc: image016.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image017.png Type: image/png Size: 1315 bytes Desc: image017.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image018.png Type: image/png Size: 1170 bytes Desc: image018.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image019.png Type: image/png Size: 1428 bytes Desc: image019.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image020.png Type: image/png Size: 1371 bytes Desc: image020.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image021.png Type: image/png Size: 1246 bytes Desc: image021.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image022.png Type: image/png Size: 4456 bytes Desc: image022.png URL: From makaplan at us.ibm.com Thu Jul 21 14:33:47 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 21 Jul 2016 09:33:47 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: I don't know. That said, let's be logical and cautious. Your network performance has got to be comparable to (preferably better than!) your disk/storage system. Think speed, latency, bandwidth, jitter, reliability, security. For a production system with data you care about, that probably means a dedicated/private/reserved channel, probably on private or leased fiber. Sure you can cobble together a demo, proof-of-concept, or prototype with less than that, but are you going to bet your career, life, friendships, data on that? Then you have to work through and test failure and recover scenarios... This forum would be one place to gather at least some anecdotes from power users/admins who might be running GPFS clusters spread over multiple kilometers... Is there a sale or marketing team selling this? What do they recommend? Here is an excerpt from an IBM white paper I found by googling... Notice the qualifier "high quality wide area network": "...Synchronous replication works well for many workloads by replicating data across storage arrays within a data center, within a campus or across geographical distances using high quality wide area network connections. When wide area network connections are not high performance or are not reliable, an asynchronous approach to data replication is required. GPFS 3.5 introduces a feature called Active File Management (AFM). ..." Of course GPFS has improved (and been renamed!) since 3.5 but 4.2 cannot magically compensate for a not-so-high-quality network! From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:34 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org Marc, what you are saying is anything outside a particular data center shouldn?t be part of a cluster? I?m not sure marketing is in line with this then. From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:52 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Careful! You need to plan and test, test and plan both failure scenarios and performance under high network loads. I don't believe GPFS was designed with the idea of splitting clusters over multiple sites. If your inter-site network runs fast enough, and you can administer it well enough -- perhaps it will work well enough... Hint: Think about the what the words "cluster" and "site" mean. GPFS does have the AFM feature, which was designed for multi-site deployments. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 21 15:01:01 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 21 Jul 2016 14:01:01 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: So just to be clear, my DCs are about 1.5kM as the fibre goes. We have dedicated extended SAN fibre and also private multi-10GbE links between the sites with Ethernet fabric switches. Simon From: > on behalf of Marc A Kaplan > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 21 July 2016 at 14:33 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario This forum would be one place to gather at least some anecdotes from power users/admins who might be running GPFS clusters spread over multiple kilometers... -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 21 15:01:49 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 21 Jul 2016 14:01:49 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Well said Marc. I think in IBM?s marketing pitches they make it sound so simple and easy. But this doesn?t take the place of well planned, tested, and properly sized implementations. From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Thursday, July 21, 2016 at 8:33 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario I don't know. That said, let's be logical and cautious. Your network performance has got to be comparable to (preferably better than!) your disk/storage system. Think speed, latency, bandwidth, jitter, reliability, security. For a production system with data you care about, that probably means a dedicated/private/reserved channel, probably on private or leased fiber. Sure you can cobble together a demo, proof-of-concept, or prototype with less than that, but are you going to bet your career, life, friendships, data on that? Then you have to work through and test failure and recover scenarios... This forum would be one place to gather at least some anecdotes from power users/admins who might be running GPFS clusters spread over multiple kilometers... Is there a sale or marketing team selling this? What do they recommend? Here is an excerpt from an IBM white paper I found by googling... Notice the qualifier "high quality wide area network": "...Synchronous replication works well for many workloads by replicating data across storage arrays within a data center, within a campus or across geographical distances using high quality wide area network connections. When wide area network connections are not high performance or are not reliable, an asynchronous approach to data replication is required. GPFS 3.5 introduces a feature called Active File Management (AFM). ..." Of course GPFS has improved (and been renamed!) since 3.5 but 4.2 cannot magically compensate for a not-so-high-quality network! From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:34 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Marc, what you are saying is anything outside a particular data center shouldn?t be part of a cluster? I?m not sure marketing is in line with this then. From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:52 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Careful! You need to plan and test, test and plan both failure scenarios and performance under high network loads. I don't believe GPFS was designed with the idea of splitting clusters over multiple sites. If your inter-site network runs fast enough, and you can administer it well enough -- perhaps it will work well enough... Hint: Think about the what the words "cluster" and "site" mean. GPFS does have the AFM feature, which was designed for multi-site deployments. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From eboyd at us.ibm.com Thu Jul 21 15:39:18 2016 From: eboyd at us.ibm.com (Edward Boyd) Date: Thu, 21 Jul 2016 14:39:18 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario @ Mark Bush In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From sjhoward at iu.edu Thu Jul 21 16:21:04 2016 From: sjhoward at iu.edu (Howard, Stewart Jameson) Date: Thu, 21 Jul 2016 15:21:04 +0000 Subject: [gpfsug-discuss] Performance Issues with SMB/NFS to GPFS Backend Message-ID: <8307febb40d24a58b036002224f27d4d@in-cci-exch09.ads.iu.edu> Hi All, I have a two-site replicate GPFS cluster running GPFS v3.5.0-26. We have recently run into a performane problem while exporting an SMB mount to one of our client labs. Specifically, this lab is attempting to run a MatLab SPM job in the SMB share and seeing sharply degraded performance versus running it over NFS to their own NFS service. The job does time-slice correction on MRI image volumes that result in roughly 15,000 file creates, plus at lease one read and at least one write to each file. Here is a list that briefly describes the time-to-completion for this job, as run under various conditions: 1) Backed by their local fileserver, running over NFS - 5 min 2) Backed by our GPFS, running over SMB - 30 min 3) Backed by our GPFS, running over NFS - 20 min 4) Backed by local disk on our exporting protocol node, over SMB - 6 min 5) Backed by local disk on our exporting protocol node, over NFS - 6 min 6) Back by GPFS, running over GPFS native client on our supercomputer - 2 min >From this list, it seems that the performance problems arise when combining either SMB or NFS with the GPFS backend. It is our conclusion that neither SMB nor NFS per se create the problem, exporting a local disk share over either of these protocols yields decent performance. Do you have any insight as to why the combination of the GPFS back-end with either NFS or SMB yields such anemic performance? Can you offer any tuning recommendations that may improve the performance when running over SMB to the GPFS back-end (our preferred method of deployment)? Thank you so much for your help as always! Stewart Howard Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 21 16:44:17 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 21 Jul 2016 11:44:17 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <8307febb40d24a58b036002224f27d4d@in-cci-exch09.ads.iu.edu> References: <8307febb40d24a58b036002224f27d4d@in-cci-exch09.ads.iu.edu> Message-ID: [Apologies] It has been pointed out to me that anyone seriously interested in clusters split over multiple sites should ReadTheFineManuals and in particular chapter 6 of the GPFS or Spectrum Scale Advanced Admin Guide. I apologize for anything I said that may have contradicted TFMs. Still it seems any which way you look at it - State of the art, today, this is not an easy plug and play, tab A into slot A, tab B into slot B and we're done - kinda-thing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Jul 21 17:04:48 2016 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Thu, 21 Jul 2016 16:04:48 +0000 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support Message-ID: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in particular the putacl and getacl functions) have no support for not following symlinks. Is there some hidden support for gpfs_putacl that will cause it to not deteference symbolic links? Something like the O_NOFOLLOW flag used elsewhere in linux? Thanks! -Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Thu Jul 21 18:15:18 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Thu, 21 Jul 2016 17:15:18 +0000 Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Message-ID: Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just keep on coming.... What's the story with Peer Snapshot functionality over GPFS protocol? Will it ever work, or should I crank my nice setup down a notch to NFS? Are peer snapshots supposed to work over NFS for Independent Writer mode AFM caches, or just Single Writers? Like Barry says, new AFM toys are great. Can we have more, please? Cheers, Luke, Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From shankbal at in.ibm.com Fri Jul 22 01:51:53 2016 From: shankbal at in.ibm.com (Shankar Balasubramanian) Date: Fri, 22 Jul 2016 06:21:53 +0530 Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol In-Reply-To: References: Message-ID: Peer snapshots will not work for independent writers as they are purely meant for Single Write. It will work both for GPFS and NFS protocols. For AFM to work well with GPFS protocol, you should have high bandwidth, reliable network. Best Regards, Shankar Balasubramanian STSM, AFM & Async DR Development IBM Systems Bangalore - Embassy Golf Links India From: Luke Raimbach To: gpfsug main discussion list Date: 07/21/2016 10:45 PM Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just keep on coming.... What's the story with Peer Snapshot functionality over GPFS protocol? Will it ever work, or should I crank my nice setup down a notch to NFS? Are peer snapshots supposed to work over NFS for Independent Writer mode AFM caches, or just Single Writers? Like Barry says, new AFM toys are great. Can we have more, please? Cheers, Luke, Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Fri Jul 22 09:36:40 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 22 Jul 2016 08:36:40 +0000 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience In-Reply-To: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> References: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> Message-ID: Hi Ash Our ifcfg files for the bonded interfaces (this applies to GPFS, data and mgmt networks) are set to mode1: BONDING_OPTS="mode=1 miimon=200" If we have ever had a network outage on the ports for these interfaces, apart from pulling a cable for testing when they went in, then I guess we have it setup right as we've never noticed an issue. The specific mode1 was asked for by our networks team. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ashish Thandavan Sent: 21 July 2016 11:26 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience Dear all, Please could anyone be able to point me at specifications required for the GPFS heartbeat network? Are there any figures for latency, jitter, etc that one should be aware of? I also have a related question about resilience. Our three GPFS NSD servers utilize a single network port on each server and communicate heartbeat traffic over a private VLAN. We are looking at improving the resilience of this setup by adding an additional network link on each server (going to a different member of a pair of stacked switches than the existing one) and running the heartbeat network over bonded interfaces on the three servers. Are there any recommendations as to which network bonding type to use? Based on the name alone, Mode 1 (active-backup) appears to be the ideal choice, and I believe the switches do not need any special configuration. However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might be the way to go; this aggregates the two ports and does require the relevant switch ports to be configured to support this. Is there a recommended bonding mode? If anyone here currently uses bonded interfaces for their GPFS heartbeat traffic, may I ask what type of bond have you configured? Have you had any problems with the setup? And more importantly, has it been of use in keeping the cluster up and running in the scenario of one network link going down? Thank you, Regards, Ash -- ------------------------- Ashish Thandavan UNIX Support Computing Officer Department of Computer Science University of Oxford Wolfson Building Parks Road Oxford OX1 3QD Phone: 01865 610733 Email: ashish.thandavan at cs.ox.ac.uk _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ashish.thandavan at cs.ox.ac.uk Fri Jul 22 09:57:02 2016 From: ashish.thandavan at cs.ox.ac.uk (Ashish Thandavan) Date: Fri, 22 Jul 2016 09:57:02 +0100 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience In-Reply-To: References: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> Message-ID: <918adc41-85a0-6012-ddb8-7ef602773455@cs.ox.ac.uk> Hi Richard, Thank you, that is very good to know! Regards, Ash On 22/07/16 09:36, Sobey, Richard A wrote: > Hi Ash > > Our ifcfg files for the bonded interfaces (this applies to GPFS, data and mgmt networks) are set to mode1: > > BONDING_OPTS="mode=1 miimon=200" > > If we have ever had a network outage on the ports for these interfaces, apart from pulling a cable for testing when they went in, then I guess we have it setup right as we've never noticed an issue. The specific mode1 was asked for by our networks team. > > Richard > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ashish Thandavan > Sent: 21 July 2016 11:26 > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience > > Dear all, > > Please could anyone be able to point me at specifications required for the GPFS heartbeat network? Are there any figures for latency, jitter, etc that one should be aware of? > > I also have a related question about resilience. Our three GPFS NSD servers utilize a single network port on each server and communicate heartbeat traffic over a private VLAN. We are looking at improving the resilience of this setup by adding an additional network link on each server (going to a different member of a pair of stacked switches than the existing one) and running the heartbeat network over bonded interfaces on the three servers. Are there any recommendations as to which network bonding type to use? > > Based on the name alone, Mode 1 (active-backup) appears to be the ideal choice, and I believe the switches do not need any special configuration. However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might be the way to go; this aggregates the two ports and does require the relevant switch ports to be configured to support this. > Is there a recommended bonding mode? > > If anyone here currently uses bonded interfaces for their GPFS heartbeat traffic, may I ask what type of bond have you configured? Have you had any problems with the setup? And more importantly, has it been of use in keeping the cluster up and running in the scenario of one network link going down? > > Thank you, > > Regards, > Ash > > > > -- > ------------------------- > Ashish Thandavan > > UNIX Support Computing Officer > Department of Computer Science > University of Oxford > Wolfson Building > Parks Road > Oxford OX1 3QD > > Phone: 01865 610733 > Email: ashish.thandavan at cs.ox.ac.uk > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- ------------------------- Ashish Thandavan UNIX Support Computing Officer Department of Computer Science University of Oxford Wolfson Building Parks Road Oxford OX1 3QD Phone: 01865 610733 Email: ashish.thandavan at cs.ox.ac.uk From mimarsh2 at vt.edu Fri Jul 22 15:39:55 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Fri, 22 Jul 2016 10:39:55 -0400 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience In-Reply-To: <918adc41-85a0-6012-ddb8-7ef602773455@cs.ox.ac.uk> References: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> <918adc41-85a0-6012-ddb8-7ef602773455@cs.ox.ac.uk> Message-ID: Sort of trailing on this thread - Is a bonded active-active 10gig ethernet network enough bandwidth to run data and heartbeat/admin on the same network? I assume it comes down to a question of latency and congestion but would like to hear others' stories. Is anyone doing anything fancy with QOS to make sure admin/heartbeat traffic is not delayed? All of our current clusters use Infiniband for data and mgt traffic, but we are building a cluster that has dual 10gigE to each compute node. The NSD servers have 40gigE connections to the core network where 10gigE switches uplink. On Fri, Jul 22, 2016 at 4:57 AM, Ashish Thandavan < ashish.thandavan at cs.ox.ac.uk> wrote: > Hi Richard, > > Thank you, that is very good to know! > > Regards, > Ash > > > On 22/07/16 09:36, Sobey, Richard A wrote: > >> Hi Ash >> >> Our ifcfg files for the bonded interfaces (this applies to GPFS, data and >> mgmt networks) are set to mode1: >> >> BONDING_OPTS="mode=1 miimon=200" >> >> If we have ever had a network outage on the ports for these interfaces, >> apart from pulling a cable for testing when they went in, then I guess we >> have it setup right as we've never noticed an issue. The specific mode1 was >> asked for by our networks team. >> >> Richard >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org [mailto: >> gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ashish Thandavan >> Sent: 21 July 2016 11:26 >> To: gpfsug-discuss at spectrumscale.org >> Subject: [gpfsug-discuss] GPFS heartbeat network specifications and >> resilience >> >> Dear all, >> >> Please could anyone be able to point me at specifications required for >> the GPFS heartbeat network? Are there any figures for latency, jitter, etc >> that one should be aware of? >> >> I also have a related question about resilience. Our three GPFS NSD >> servers utilize a single network port on each server and communicate >> heartbeat traffic over a private VLAN. We are looking at improving the >> resilience of this setup by adding an additional network link on each >> server (going to a different member of a pair of stacked switches than the >> existing one) and running the heartbeat network over bonded interfaces on >> the three servers. Are there any recommendations as to which network >> bonding type to use? >> >> Based on the name alone, Mode 1 (active-backup) appears to be the ideal >> choice, and I believe the switches do not need any special configuration. >> However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might >> be the way to go; this aggregates the two ports and does require the >> relevant switch ports to be configured to support this. >> Is there a recommended bonding mode? >> >> If anyone here currently uses bonded interfaces for their GPFS heartbeat >> traffic, may I ask what type of bond have you configured? Have you had any >> problems with the setup? And more importantly, has it been of use in >> keeping the cluster up and running in the scenario of one network link >> going down? >> >> Thank you, >> >> Regards, >> Ash >> >> >> >> -- >> ------------------------- >> Ashish Thandavan >> >> UNIX Support Computing Officer >> Department of Computer Science >> University of Oxford >> Wolfson Building >> Parks Road >> Oxford OX1 3QD >> >> Phone: 01865 610733 >> Email: ashish.thandavan at cs.ox.ac.uk >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > ------------------------- > Ashish Thandavan > > UNIX Support Computing Officer > Department of Computer Science > University of Oxford > Wolfson Building > Parks Road > Oxford OX1 3QD > > Phone: 01865 610733 > Email: ashish.thandavan at cs.ox.ac.uk > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chekh at stanford.edu Fri Jul 22 17:25:49 2016 From: chekh at stanford.edu (Alex Chekholko) Date: Fri, 22 Jul 2016 09:25:49 -0700 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience In-Reply-To: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> References: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> Message-ID: <81342a19-1cec-14de-9d7f-176ff7511511@stanford.edu> Hi Ashish, Can you describe more about what problem you are trying to solve? And what failure mode you are trying to avoid? GPFS depends on uninterrupted network access between the cluster members (well, mainly between each cluster member and the current cluster manager node), but there are many ways to ensure that, and many ways to recover from interruptions. e.g. we tend to set minMissedPingTimeout 30 pingPeriod 5 Bump those up if network/system gets busy. Performance and latency will suffer but at least cluster members won't be expelled. Regards, Alex On 07/21/2016 03:26 AM, Ashish Thandavan wrote: > Dear all, > > Please could anyone be able to point me at specifications required for > the GPFS heartbeat network? Are there any figures for latency, jitter, > etc that one should be aware of? > > I also have a related question about resilience. Our three GPFS NSD > servers utilize a single network port on each server and communicate > heartbeat traffic over a private VLAN. We are looking at improving the > resilience of this setup by adding an additional network link on each > server (going to a different member of a pair of stacked switches than > the existing one) and running the heartbeat network over bonded > interfaces on the three servers. Are there any recommendations as to > which network bonding type to use? > > Based on the name alone, Mode 1 (active-backup) appears to be the ideal > choice, and I believe the switches do not need any special > configuration. However, it has been suggested that Mode 4 (802.3ad) or > LACP bonding might be the way to go; this aggregates the two ports and > does require the relevant switch ports to be configured to support this. > Is there a recommended bonding mode? > > If anyone here currently uses bonded interfaces for their GPFS heartbeat > traffic, may I ask what type of bond have you configured? Have you had > any problems with the setup? And more importantly, has it been of use in > keeping the cluster up and running in the scenario of one network link > going down? > > Thank you, > > Regards, > Ash > > > -- Alex Chekholko chekh at stanford.edu From volobuev at us.ibm.com Fri Jul 22 18:56:31 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Fri, 22 Jul 2016 10:56:31 -0700 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: There are multiple ways to accomplish active-active two-side synchronous DR, aka "stretch cluster". The most common approach is to have 3 sites: two main sites A and B, plus tiebreaker site C. The two main sites host all data/metadata disks and each has some even number of quorum nodes. There's no stretched SAN, each site has its own set of NSDs defined. The tiebreaker site consists of a single quorum node with a small descOnly LUN. In this config, any of the 3 sites can do down or be disconnected from the rest without affecting the other two. The tiebreaker site is essential: it provides a quorum node for node majority quorum to function, and a descOnly disk for the file system descriptor quorum. Technically speaking, one do away with the need to have a quorum node at site C by using "minority quorum", i.e. tiebreaker disks, but this model is more complex and it is harder to predict its behavior under various failure conditions. The basic problem with the minority quorum is that it allows a minority of nodes to win in a network partition scenario, just like the name implies. In the extreme case this leads to the "dictator problem", when a single partitioned node could manage to win the disk election and thus kick everyone else out. And since a tiebreaker disk needs to be visible from all quorum nodes, you do need a stretched SAN that extends between sites. The classic active-active stretch cluster only requires a good TCP/IP network. The question that gets asked a lot is "how good should be network connection between sites be". There's no simple answer, unfortunately. It would be completely impractical to try to frame this in simple thresholds. The worse the network connection is, the more pain it produces, but everyone has a different level of pain tolerance. And everyone's workload is different. In any GPFS configuration that uses data replication, writes are impacted far more by replication than reads. So a read-mostly workload may run fine with a dodgy inter-site link, while a write-heavy workload may just run into the ground, as IOs may be submitted faster than they could be completed. The buffering model could make a big difference. An application that does a fair amount of write bursts, with those writes being buffered in a generously sized pagepool, may perform acceptably, while a different application that uses O_SYNC or O_DIRECT semantics for writes may run a lot worse, all other things being equal. As long as all nodes can renew their disk leases within the configured disk lease interval (35 sec by default), GPFS will basically work, so the absolute threshold for the network link quality is not particularly stringent, but beyond that it all depends on your workload and your level of pain tolerance. Practically speaking, you want a network link with low-double-digits RTT at worst, almost no packet loss, and bandwidth commensurate with your application IO needs (fudged some to allow for write amplification -- another factor that's entirely workload-dependent). So a link with, say, 100ms RTT and 2% packet loss is not going to be usable to almost anyone, in my opinion, a link with 30ms RTT and 0.1% packet loss may work for some undemanding read-mostly workloads, and so on. So you pretty much have to try it out to see. The disk configuration is another tricky angle. The simplest approach is to have two groups of data/metadata NSDs, on sites A and B, and not have any sort of SAN reaching across sites. Historically, such a config was actually preferred over a stretched SAN, because it allowed for a basic site topology definition. When multiple replicas of the same logical block are present, it is obviously better/faster to read the replica that resides on a disk that's local to a given site. This is conceptually simple, but how would GPFS know what a site is and what disks are local vs remote? To GPFS, all disks are equal. Historically, the readReplicaPolicy=local config parameter was put forward to work around the problem. The basic idea was: if the reader node is on the same subnet as the primary NSD server for a given replica, this replica is "local", and is thus preferred. This sort of works, but requires a very specific network configuration, which isn't always practical. Starting with GPFS 4.1.1, GPFS implements readReplicaPolicy=fastest, where the best replica for reads is picked based on observed disk IO latency. This is more general and works for all disk topologies, including a stretched SAN. yuri From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list , Date: 07/21/2016 05:45 AM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. Site1 NSD Node1 ---NSD1 ---Physical LUN1 from SAN1 NSD Node2 Site2 NSD Node3 ---NSD2 ?Physical LUN2 from SAN2 NSD Node4 Or Site1 NSD Node1 ----NSD1 ?Physical LUN1 from SAN1 ----NSD2 ?Physical LUN2 from SAN2 NSD Node2 Site 2 NSD Node3 ---NSD2 ? Physical LUN2 from SAN2 ---NSD1 --Physical LUN1 from SAN1 NSD Node4 Site 3 Node5 Quorum From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 7:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11130580.gif Type: image/gif Size: 1621 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11971715.gif Type: image/gif Size: 1597 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11264118.gif Type: image/gif Size: 1072 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11128019.gif Type: image/gif Size: 979 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11470612.gif Type: image/gif Size: 1564 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11952793.gif Type: image/gif Size: 1313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11488202.gif Type: image/gif Size: 1168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11248852.gif Type: image/gif Size: 1426 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11036495.gif Type: image/gif Size: 1369 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11647743.gif Type: image/gif Size: 1244 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11125683.gif Type: image/gif Size: 4454 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11353219.gif Type: image/gif Size: 1622 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11280235.gif Type: image/gif Size: 1598 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11669375.gif Type: image/gif Size: 1073 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11650693.gif Type: image/gif Size: 980 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11604766.gif Type: image/gif Size: 1565 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11840270.gif Type: image/gif Size: 1314 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11842186.gif Type: image/gif Size: 1169 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11809831.gif Type: image/gif Size: 1427 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11549547.gif Type: image/gif Size: 1370 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11344792.gif Type: image/gif Size: 1245 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11830257.gif Type: image/gif Size: 4455 bytes Desc: not available URL: From dhildeb at us.ibm.com Fri Jul 22 19:00:23 2016 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Fri, 22 Jul 2016 11:00:23 -0700 Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol In-Reply-To: References: Message-ID: Just to expand a bit on the use of peer snapshots. The point of psnap is to create a snapshot in the cache that is identical to a snapshot on the home. This way you can recover files from a snapshot of a fileset on the 'replica' of the data just like you can from a snapshot in the 'cache' (where the data was generated). With IW mode, its typically possible that the data could be changing on the home from another cache or clients directly running on the data on the home. In this case, it would be impossible to ensure that the snapshots in the cache and on the home are identical. Dean From: "Shankar Balasubramanian" To: gpfsug main discussion list Date: 07/21/2016 05:52 PM Subject: Re: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Peer snapshots will not work for independent writers as they are purely meant for Single Write. It will work both for GPFS and NFS protocols. For AFM to work well with GPFS protocol, you should have high bandwidth, reliable network. Best Regards, Shankar Balasubramanian STSM, AFM & Async DR Development IBM Systems Bangalore - Embassy Golf Links India Inactive hide details for Luke Raimbach ---07/21/2016 10:45:51 PM---Dear GPFS Experts, I've worn my brain out today with AFM anLuke Raimbach ---07/21/2016 10:45:51 PM---Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just k From: Luke Raimbach To: gpfsug main discussion list Date: 07/21/2016 10:45 PM Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just keep on coming.... What's the story with Peer Snapshot functionality over GPFS protocol? Will it ever work, or should I crank my nice setup down a notch to NFS? Are peer snapshots supposed to work over NFS for Independent Writer mode AFM caches, or just Single Writers? Like Barry says, new AFM toys are great. Can we have more, please? Cheers, Luke, Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From volobuev at us.ibm.com Fri Jul 22 20:24:58 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Fri, 22 Jul 2016 12:24:58 -0700 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> Message-ID: In a word, no. I can't blame anyone for suspecting that there's yet another hidden flag somewhere, given our track record, but there's nothing hidden on this one, there's just no code to implement O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be a reasonable thing to have, so if you feel strongly enough about it to open an RFE, go for it. yuri From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" To: gpfsug main discussion list , Date: 07/21/2016 09:05 AM Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in particular the putacl and getacl functions) have no support for not following symlinks. Is there some hidden support for gpfs_putacl that will cause it to not deteference symbolic links? Something like the O_NOFOLLOW flag used elsewhere in linux? Thanks! -Aaron_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Fri Jul 22 23:36:46 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 22 Jul 2016 18:36:46 -0400 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> Message-ID: <9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> Thanks Yuri! I do wonder what security implications this might have for the policy engine where a nefarious user could trick it into performing an action on another file via symlink hijacking. Truthfully I've been more worried about an accidental hijack rather than someone being malicious. I'll open an RFE for it since I think it would be nice to have. (While I'm at it, I think I'll open another for having chown call exposed via the API). -Aaron On 7/22/16 3:24 PM, Yuri L Volobuev wrote: > In a word, no. I can't blame anyone for suspecting that there's yet > another hidden flag somewhere, given our track record, but there's > nothing hidden on this one, there's just no code to implement > O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be > a reasonable thing to have, so if you feel strongly enough about it to > open an RFE, go for it. > > yuri > > Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[COMPUTER > SCIENCE CORP]" ---07/21/2016 09:05:11 AM---Hi Everyone, I've"Knister, > Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" ---07/21/2016 09:05:11 > AM---Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) > and API calls (in particular the > > From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > > To: gpfsug main discussion list , > Date: 07/21/2016 09:05 AM > Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > Hi Everyone, > > I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in > particular the putacl and getacl functions) have no support for not > following symlinks. Is there some hidden support for gpfs_putacl that > will cause it to not deteference symbolic links? Something like the > O_NOFOLLOW flag used elsewhere in linux? > > Thanks! > > -Aaron_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: OpenPGP digital signature URL: From aaron.s.knister at nasa.gov Sat Jul 23 05:46:30 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sat, 23 Jul 2016 00:46:30 -0400 Subject: [gpfsug-discuss] inode update delay? Message-ID: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> I've noticed that there can be a several minute delay between the time changes to an inode occur and when those changes are reflected in the results of an inode scan. I've been working on code that checks ia_xperm to determine if a given file has extended acl entries and noticed in testing it that the acl flag wasn't getting set immediately after giving a file an acl. Here's what I mean: # cd /gpfsm/dnb32 # date; setfacl -b acltest* Sat Jul 23 00:24:57 EDT 2016 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:24:59 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:10 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:21 EDT 2016 0 I'm a little confused about what's going on here-- is there some kind of write-behind for inode updates? Is there a way I can cause the cluster to quiesce and flush all pending inode updates (an mmfsctl suspend and resume seem to have this effect but I was looking for something a little less user-visible)? If I access the directory containing the files from another node via the VFS mount then the update appears immediately in the inode scan. A mere inode scan from another node w/o touching the filesystem mount doesn't necessarily seem to trigger this behavior. Thanks! -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From shankbal at in.ibm.com Fri Jul 22 08:53:51 2016 From: shankbal at in.ibm.com (Shankar Balasubramanian) Date: Fri, 22 Jul 2016 13:23:51 +0530 Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol In-Reply-To: References: Message-ID: One correction to the note below, peer snapshots are not supported when AFM use GPFS protocol. Best Regards, Shankar Balasubramanian STSM, AFM & Async DR Development IBM Systems Bangalore - Embassy Golf Links India From: Shankar Balasubramanian/India/IBM at IBMIN To: gpfsug main discussion list Date: 07/22/2016 06:22 AM Subject: Re: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Peer snapshots will not work for independent writers as they are purely meant for Single Write. It will work both for GPFS and NFS protocols. For AFM to work well with GPFS protocol, you should have high bandwidth, reliable network. Best Regards, Shankar Balasubramanian STSM, AFM & Async DR Development IBM Systems Bangalore - Embassy Golf Links India Inactive hide details for Luke Raimbach ---07/21/2016 10:45:51 PM---Dear GPFS Experts, I've worn my brain out today with AFM anLuke Raimbach ---07/21/2016 10:45:51 PM---Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just k From: Luke Raimbach To: gpfsug main discussion list Date: 07/21/2016 10:45 PM Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just keep on coming.... What's the story with Peer Snapshot functionality over GPFS protocol? Will it ever work, or should I crank my nice setup down a notch to NFS? Are peer snapshots supposed to work over NFS for Independent Writer mode AFM caches, or just Single Writers? Like Barry says, new AFM toys are great. Can we have more, please? Cheers, Luke, Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From MKEIGO at jp.ibm.com Sun Jul 24 03:31:05 2016 From: MKEIGO at jp.ibm.com (Keigo Matsubara) Date: Sun, 24 Jul 2016 11:31:05 +0900 Subject: [gpfsug-discuss] inode update delay? In-Reply-To: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> References: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> Message-ID: Hi Aaron, I think the product is designed so that some inode fields are not propagated among nodes instantly in order to avoid unnecessary overhead within the cluster. See: Exceptions to Open Group technical standards - IBM Spectrum Scale: Administration and Programming Reference - IBM Spectrum Scale 4.2 - IBM Knowledge Center https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_xopen.htm --- Keigo Matsubara, Industry Architect, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 From: Aaron Knister To: Date: 2016/07/23 13:47 Subject: [gpfsug-discuss] inode update delay? Sent by: gpfsug-discuss-bounces at spectrumscale.org I've noticed that there can be a several minute delay between the time changes to an inode occur and when those changes are reflected in the results of an inode scan. I've been working on code that checks ia_xperm to determine if a given file has extended acl entries and noticed in testing it that the acl flag wasn't getting set immediately after giving a file an acl. Here's what I mean: # cd /gpfsm/dnb32 # date; setfacl -b acltest* Sat Jul 23 00:24:57 EDT 2016 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:24:59 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:10 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:21 EDT 2016 0 I'm a little confused about what's going on here-- is there some kind of write-behind for inode updates? Is there a way I can cause the cluster to quiesce and flush all pending inode updates (an mmfsctl suspend and resume seem to have this effect but I was looking for something a little less user-visible)? If I access the directory containing the files from another node via the VFS mount then the update appears immediately in the inode scan. A mere inode scan from another node w/o touching the filesystem mount doesn't necessarily seem to trigger this behavior. Thanks! -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stef.coene at docum.org Sun Jul 24 11:27:28 2016 From: stef.coene at docum.org (Stef Coene) Date: Sun, 24 Jul 2016 12:27:28 +0200 Subject: [gpfsug-discuss] New to GPFS Message-ID: <57949810.2030002@docum.org> Hi, Like the subject says, I'm new to Spectrum Scale. We are considering GPFS as back end for CommVault back-up data. Back-end storage will be iSCSI (300 TB) and V5000 SAS (100 TB). I created a 2 node cluster (RHEL) with 2 protocol nodes and 1 client (Ubuntu) as test on ESXi 6. The RHEL servers are upgraded to 7.2. Will that be a problem or not? I saw some posts that there is an issue with RHEL 7.2.... Stef From makaplan at us.ibm.com Sun Jul 24 16:11:06 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 24 Jul 2016 11:11:06 -0400 Subject: [gpfsug-discuss] inode update delay? / mmapplypolicy In-Reply-To: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> References: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> Message-ID: mmapplypolicy uses the inodescan API which to gain overall speed, bypasses various buffers, caches, locks, ... and just reads inodes "directly" from disk. So the "view" of inodescan is somewhat "behind" the overall state of the live filesystem as viewed from the usual Posix APIs, such as stat(2). (Not to worry, all metadata updates are logged, so in event of a power loss or OS crash, GPFS recovers a consistent state from its log files...) This is at least mentioned in the docs. `mmfsctl suspend-write; mmfsctl resume;` is the only practical way I know to guarantee a forced a flush of all "dirty" buffers to disk -- any metadata updates before the suspend will for sure become visible to an inodescan after the resume. (Classic `sync` is not quite the same...) But think about this --- scanning a "live" file system is always somewhat iffy-dodgy and the result is smeared over the time of the scan -- if there are any concurrent changes during the scan your results are imprecise. An alternative is to use `mmcrsnapshot` and scan the snapshot. From: Aaron Knister To: Date: 07/23/2016 12:46 AM Subject: [gpfsug-discuss] inode update delay? Sent by: gpfsug-discuss-bounces at spectrumscale.org I've noticed that there can be a several minute delay between the time changes to an inode occur and when those changes are reflected in the results of an inode scan. I've been working on code that checks ia_xperm to determine if a given file has extended acl entries and noticed in testing it that the acl flag wasn't getting set immediately after giving a file an acl. Here's what I mean: # cd /gpfsm/dnb32 # date; setfacl -b acltest* Sat Jul 23 00:24:57 EDT 2016 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:24:59 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:10 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:21 EDT 2016 0 I'm a little confused about what's going on here-- is there some kind of write-behind for inode updates? Is there a way I can cause the cluster to quiesce and flush all pending inode updates (an mmfsctl suspend and resume seem to have this effect but I was looking for something a little less user-visible)? If I access the directory containing the files from another node via the VFS mount then the update appears immediately in the inode scan. A mere inode scan from another node w/o touching the filesystem mount doesn't necessarily seem to trigger this behavior. Thanks! -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Sun Jul 24 16:54:16 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 24 Jul 2016 11:54:16 -0400 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: <9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> <9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> Message-ID: Regarding "policy engine"/inodescan and symbolic links. 1. Either the MODE and MISC_ATTRIBUTES properties (SQL variables) can be tested to see if an inode/file is a symlink or not. 2. Default behaviour for mmapplypolicy is to skip over symlinks. You must specify... DIRECTORIES_PLUS which ... Indicates that non-regular file objects (directories, symbolic links, and so on) should be included in the list. If not specified, only ordinary data files are included in the candidate lists. 3. You can apply Linux commands and APIs to GPFS pathnames. 4. Of course, if you need to get at a GPFS feature or attribute that is not supported by Linux ... 5. Hmmm... on my Linux system `setfacl -P ...` does not follow symlinks, but neither does it set the ACL for the symlink... Googling... some people consider this to be a bug, but maybe it is a feature... --marc From: Aaron Knister To: Date: 07/22/2016 06:37 PM Subject: Re: [gpfsug-discuss] GPFS API O_NOFOLLOW support Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Yuri! I do wonder what security implications this might have for the policy engine where a nefarious user could trick it into performing an action on another file via symlink hijacking. Truthfully I've been more worried about an accidental hijack rather than someone being malicious. I'll open an RFE for it since I think it would be nice to have. (While I'm at it, I think I'll open another for having chown call exposed via the API). -Aaron On 7/22/16 3:24 PM, Yuri L Volobuev wrote: > In a word, no. I can't blame anyone for suspecting that there's yet > another hidden flag somewhere, given our track record, but there's > nothing hidden on this one, there's just no code to implement > O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be > a reasonable thing to have, so if you feel strongly enough about it to > open an RFE, go for it. > > yuri > > Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[COMPUTER > SCIENCE CORP]" ---07/21/2016 09:05:11 AM---Hi Everyone, I've"Knister, > Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" ---07/21/2016 09:05:11 > AM---Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) > and API calls (in particular the > > From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > > To: gpfsug main discussion list , > Date: 07/21/2016 09:05 AM > Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > Hi Everyone, > > I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in > particular the putacl and getacl functions) have no support for not > following symlinks. Is there some hidden support for gpfs_putacl that > will cause it to not deteference symbolic links? Something like the > O_NOFOLLOW flag used elsewhere in linux? > > Thanks! > > -Aaron_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 [attachment "signature.asc" deleted by Marc A Kaplan/Watson/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Mon Jul 25 00:15:02 2016 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Sun, 24 Jul 2016 23:15:02 +0000 Subject: [gpfsug-discuss] New to GPFS In-Reply-To: <57949810.2030002@docum.org> References: <57949810.2030002@docum.org> Message-ID: <43489850f79c446c9d9896292608a292@exch1-cdc.nexus.csiro.au> The issue is with the Protocols version of GPFS. I am using the non-protocols version 4.2.0.3 successfully on CentOS 7.2. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Stef Coene Sent: Sunday, 24 July 2016 8:27 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] New to GPFS Hi, Like the subject says, I'm new to Spectrum Scale. We are considering GPFS as back end for CommVault back-up data. Back-end storage will be iSCSI (300 TB) and V5000 SAS (100 TB). I created a 2 node cluster (RHEL) with 2 protocol nodes and 1 client (Ubuntu) as test on ESXi 6. The RHEL servers are upgraded to 7.2. Will that be a problem or not? I saw some posts that there is an issue with RHEL 7.2.... Stef _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mweil at wustl.edu Mon Jul 25 15:56:52 2016 From: mweil at wustl.edu (Matt Weil) Date: Mon, 25 Jul 2016 09:56:52 -0500 Subject: [gpfsug-discuss] New to GPFS In-Reply-To: <57949810.2030002@docum.org> References: <57949810.2030002@docum.org> Message-ID: On 7/24/16 5:27 AM, Stef Coene wrote: > Hi, > > Like the subject says, I'm new to Spectrum Scale. > > We are considering GPFS as back end for CommVault back-up data. > Back-end storage will be iSCSI (300 TB) and V5000 SAS (100 TB). > I created a 2 node cluster (RHEL) with 2 protocol nodes and 1 client > (Ubuntu) as test on ESXi 6. > > The RHEL servers are upgraded to 7.2. Will that be a problem or not? > I saw some posts that there is an issue with RHEL 7.2.... we had to upgrade to 4.2.0.3 when running RHEL 7.2 > > > Stef > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From aaron.s.knister at nasa.gov Mon Jul 25 20:50:54 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 25 Jul 2016 15:50:54 -0400 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> <9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> Message-ID: <90c6b452-2df4-b217-eaaf-2f991bb2921e@nasa.gov> Thanks Marc. In my mind the issue is a timing one between the moment the policy engine decides to perform an action on a file (e.g. matching the path inode/gen number with that from the inode scan) and when it actually takes that action by calling an api call that takes a path as an argument. Your suggestion in #3 is the route I think I'm going to take here since I can call acl_get_fd after calling open/openat with O_NOFOLLOW. On 7/24/16 11:54 AM, Marc A Kaplan wrote: > Regarding "policy engine"/inodescan and symbolic links. > > 1. Either the MODE and MISC_ATTRIBUTES properties (SQL variables) can be > tested to see if an inode/file is a symlink or not. > > 2. Default behaviour for mmapplypolicy is to skip over symlinks. You > must specify... > > *DIRECTORIES_PLUS which ...* > > Indicates that non-regular file objects (directories, symbolic links, > and so on) should be included in > the list. If not specified, only ordinary data files are included in the > candidate lists. > > 3. You can apply Linux commands and APIs to GPFS pathnames. > > 4. Of course, if you need to get at a GPFS feature or attribute that is > not supported by Linux ... > > 5. Hmmm... on my Linux system `setfacl -P ...` does not follow symlinks, > but neither does it set the ACL for the symlink... > Googling... some people consider this to be a bug, but maybe it is a > feature... > > --marc > > > > From: Aaron Knister > To: > Date: 07/22/2016 06:37 PM > Subject: Re: [gpfsug-discuss] GPFS API O_NOFOLLOW support > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Thanks Yuri! I do wonder what security implications this might have for > the policy engine where a nefarious user could trick it into performing > an action on another file via symlink hijacking. Truthfully I've been > more worried about an accidental hijack rather than someone being > malicious. I'll open an RFE for it since I think it would be nice to > have. (While I'm at it, I think I'll open another for having chown call > exposed via the API). > > -Aaron > > On 7/22/16 3:24 PM, Yuri L Volobuev wrote: >> In a word, no. I can't blame anyone for suspecting that there's yet >> another hidden flag somewhere, given our track record, but there's >> nothing hidden on this one, there's just no code to implement >> O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be >> a reasonable thing to have, so if you feel strongly enough about it to >> open an RFE, go for it. >> >> yuri >> >> Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[COMPUTER >> SCIENCE CORP]" ---07/21/2016 09:05:11 AM---Hi Everyone, I've"Knister, >> Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" ---07/21/2016 09:05:11 >> AM---Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) >> and API calls (in particular the >> >> From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" >> >> To: gpfsug main discussion list , >> Date: 07/21/2016 09:05 AM >> Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> ------------------------------------------------------------------------ >> >> >> >> Hi Everyone, >> >> I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in >> particular the putacl and getacl functions) have no support for not >> following symlinks. Is there some hidden support for gpfs_putacl that >> will cause it to not deteference symbolic links? Something like the >> O_NOFOLLOW flag used elsewhere in linux? >> >> Thanks! >> >> -Aaron_______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > [attachment "signature.asc" deleted by Marc A Kaplan/Watson/IBM] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Mon Jul 25 20:57:25 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 25 Jul 2016 15:57:25 -0400 Subject: [gpfsug-discuss] inode update delay? / mmapplypolicy In-Reply-To: References: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> Message-ID: <291e1237-98d6-2abe-b1af-8898da61629f@nasa.gov> Thanks again, Marc. You're quite right about the results being smeared over time on a live filesystem even if the inodescan didn't lag behind slightly. The use case here is a mass uid number migration. File ownership is easy because I can be guaranteed after a certain point in time that no new files under the user's old uid number can be created. However, in part because of inheritance I'm not so lucky when it comes to ACLs. I almost need to do 2 passes when looking at the ACLs but even that's not guaranteed to catch everything. Using a snapshot is an interesting idea to give me a stable point in time snapshot to determine if I got everything. -Aaron On 7/24/16 11:11 AM, Marc A Kaplan wrote: > mmapplypolicy uses the inodescan API which to gain overall speed, > bypasses various buffers, caches, locks, ... and just reads inodes > "directly" from disk. > > So the "view" of inodescan is somewhat "behind" the overall state of the > live filesystem as viewed from the usual Posix APIs, such as stat(2). > (Not to worry, all metadata updates are logged, so in event of a power > loss or OS crash, GPFS recovers a consistent state from its log files...) > > This is at least mentioned in the docs. > > `mmfsctl suspend-write; mmfsctl resume;` is the only practical way I > know to guarantee a forced a flush of all "dirty" buffers to disk -- any > metadata updates before the suspend will for sure > become visible to an inodescan after the resume. (Classic `sync` is not > quite the same...) > > But think about this --- scanning a "live" file system is always > somewhat iffy-dodgy and the result is smeared over the time of the scan > -- if there are any concurrent changes > during the scan your results are imprecise. > > An alternative is to use `mmcrsnapshot` and scan the snapshot. > > > > > From: Aaron Knister > To: > Date: 07/23/2016 12:46 AM > Subject: [gpfsug-discuss] inode update delay? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > I've noticed that there can be a several minute delay between the time > changes to an inode occur and when those changes are reflected in the > results of an inode scan. I've been working on code that checks ia_xperm > to determine if a given file has extended acl entries and noticed in > testing it that the acl flag wasn't getting set immediately after giving > a file an acl. Here's what I mean: > > # cd /gpfsm/dnb32 > > # date; setfacl -b acltest* > Sat Jul 23 00:24:57 EDT 2016 > > # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l > Sat Jul 23 00:24:59 EDT 2016 > 5 > > # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l > Sat Jul 23 00:25:10 EDT 2016 > 5 > > # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l > Sat Jul 23 00:25:21 EDT 2016 > 0 > > I'm a little confused about what's going on here-- is there some kind of > write-behind for inode updates? Is there a way I can cause the cluster > to quiesce and flush all pending inode updates (an mmfsctl suspend and > resume seem to have this effect but I was looking for something a little > less user-visible)? If I access the directory containing the files from > another node via the VFS mount then the update appears immediately in > the inode scan. A mere inode scan from another node w/o touching the > filesystem mount doesn't necessarily seem to trigger this behavior. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From makaplan at us.ibm.com Mon Jul 25 22:46:01 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 25 Jul 2016 17:46:01 -0400 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: <90c6b452-2df4-b217-eaaf-2f991bb2921e@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov><9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> <90c6b452-2df4-b217-eaaf-2f991bb2921e@nasa.gov> Message-ID: Unfortunately there is always a window of time between testing the file and acting on the file's pathname. At any moment after testing (finding) ... the file could change, or the same pathname could be pointing to a different inode/file. That is a potential problem with just about every Unix file utility and/or script you put together with the standard commands... find ... | xargs ... mmapplypolicy has the -e option to narrow the window by retesting just before executing an action. Of course it's seldom a real problem -- you have to think about scenarios where two minds are working within the same namespace of files and then they are doing so either carelessly without communicating or one is deliberately trying to cause trouble for the other! From: Aaron Knister To: Date: 07/25/2016 03:51 PM Subject: Re: [gpfsug-discuss] GPFS API O_NOFOLLOW support Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Marc. In my mind the issue is a timing one between the moment the policy engine decides to perform an action on a file (e.g. matching the path inode/gen number with that from the inode scan) and when it actually takes that action by calling an api call that takes a path as an argument. Your suggestion in #3 is the route I think I'm going to take here since I can call acl_get_fd after calling open/openat with O_NOFOLLOW. On 7/24/16 11:54 AM, Marc A Kaplan wrote: > Regarding "policy engine"/inodescan and symbolic links. > > 1. Either the MODE and MISC_ATTRIBUTES properties (SQL variables) can be > tested to see if an inode/file is a symlink or not. > > 2. Default behaviour for mmapplypolicy is to skip over symlinks. You > must specify... > > *DIRECTORIES_PLUS which ...* > > Indicates that non-regular file objects (directories, symbolic links, > and so on) should be included in > the list. If not specified, only ordinary data files are included in the > candidate lists. > > 3. You can apply Linux commands and APIs to GPFS pathnames. > > 4. Of course, if you need to get at a GPFS feature or attribute that is > not supported by Linux ... > > 5. Hmmm... on my Linux system `setfacl -P ...` does not follow symlinks, > but neither does it set the ACL for the symlink... > Googling... some people consider this to be a bug, but maybe it is a > feature... > > --marc > > > > From: Aaron Knister > To: > Date: 07/22/2016 06:37 PM > Subject: Re: [gpfsug-discuss] GPFS API O_NOFOLLOW support > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Thanks Yuri! I do wonder what security implications this might have for > the policy engine where a nefarious user could trick it into performing > an action on another file via symlink hijacking. Truthfully I've been > more worried about an accidental hijack rather than someone being > malicious. I'll open an RFE for it since I think it would be nice to > have. (While I'm at it, I think I'll open another for having chown call > exposed via the API). > > -Aaron > > On 7/22/16 3:24 PM, Yuri L Volobuev wrote: >> In a word, no. I can't blame anyone for suspecting that there's yet >> another hidden flag somewhere, given our track record, but there's >> nothing hidden on this one, there's just no code to implement >> O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be >> a reasonable thing to have, so if you feel strongly enough about it to >> open an RFE, go for it. >> >> yuri >> >> Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[COMPUTER >> SCIENCE CORP]" ---07/21/2016 09:05:11 AM---Hi Everyone, I've"Knister, >> Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" ---07/21/2016 09:05:11 >> AM---Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) >> and API calls (in particular the >> >> From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" >> >> To: gpfsug main discussion list , >> Date: 07/21/2016 09:05 AM >> Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> ------------------------------------------------------------------------ >> >> >> >> Hi Everyone, >> >> I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in >> particular the putacl and getacl functions) have no support for not >> following symlinks. Is there some hidden support for gpfs_putacl that >> will cause it to not deteference symbolic links? Something like the >> O_NOFOLLOW flag used elsewhere in linux? >> >> Thanks! >> >> -Aaron_______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > [attachment "signature.asc" deleted by Marc A Kaplan/Watson/IBM] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From Luke.Raimbach at crick.ac.uk Tue Jul 26 15:17:35 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Tue, 26 Jul 2016 14:17:35 +0000 Subject: [gpfsug-discuss] AFM Crashing the MDS Message-ID: Hi All, Anyone seen GPFS barf like this before? I'll explain the setup: RO AFM cache on remote site (cache A) for reading remote datasets quickly, LU AFM cache at destination site (cache B) for caching data from cache A (has a local compute cluster mounting this over multi-cluster), IW AFM cache at destination site (cache C) for presenting cache B over NAS protocols, Reading files in cache C should pull data from the remote source through cache A->B->C Modifying files in cache C should pull data into cache B and then break the cache relationship for that file, converting it to a local copy. Those modifications should include metadata updates (e.g. chown). To speed things up we prefetch files into cache B for datasets which are undergoing migration and have entered a read-only state at the source. When issuing chown on a directory in cache C containing ~4.5million files, the MDS for the AFM cache C crashes badly: Tue Jul 26 13:28:52.487 2016: [X] logAssertFailed: addr.isReserved() || addr.getClusterIdx() == clusterIdx Tue Jul 26 13:28:52.488 2016: [X] return code 0, reason code 1, log record tag 0 Tue Jul 26 13:28:53.392 2016: [X] *** Assert exp(addr.isReserved() || addr.getClusterIdx() == clusterIdx) in line 1936 of file /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h Tue Jul 26 13:28:53.393 2016: [E] *** Traceback: Tue Jul 26 13:28:53.394 2016: [E] 2:0x7F6DC95444A6 logAssertFailed + 0x2D6 at ??:0 Tue Jul 26 13:28:53.395 2016: [E] 3:0x7F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 0x4B4 at ??:0 Tue Jul 26 13:28:53.396 2016: [E] 4:0x7F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 0x91 at ??:0 Tue Jul 26 13:28:53.397 2016: [E] 5:0x7F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 0x346 at ??:0 Tue Jul 26 13:28:53.398 2016: [E] 6:0x7F6DC9332494 HandleMBPcache(MBPcacheParms*) + 0xB4 at ??:0 Tue Jul 26 13:28:53.399 2016: [E] 7:0x7F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 0x3C3 at ??:0 Tue Jul 26 13:28:53.400 2016: [E] 8:0x7F6DC908BC06 Thread::callBody(Thread*) + 0x46 at ??:0 Tue Jul 26 13:28:53.401 2016: [E] 9:0x7F6DC907A0D2 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 Tue Jul 26 13:28:53.402 2016: [E] 10:0x7F6DC87A3AA1 start_thread + 0xD1 at ??:0 Tue Jul 26 13:28:53.403 2016: [E] 11:0x7F6DC794A93D clone + 0x6D at ??:0 mmfsd: /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h:1936: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion `addr.isReserved() || addr.getClusterIdx() == clusterIdx' failed. Tue Jul 26 13:28:53.404 2016: [N] Signal 6 at location 0x7F6DC7894625 in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:53.405 2016: [I] rax 0x0000000000000000 rbx 0x00007F6DC8DCB000 Tue Jul 26 13:28:53.406 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx 0x0000000000000006 Tue Jul 26 13:28:53.407 2016: [I] rsp 0x00007F6DAAEA01F8 rbp 0x00007F6DCA05C8B0 Tue Jul 26 13:28:53.408 2016: [I] rsi 0x00000000000018F8 rdi 0x0000000000001876 Tue Jul 26 13:28:53.409 2016: [I] r8 0xFEFEFEFEFEFEFEFF r9 0xFEFEFEFEFF092D63 Tue Jul 26 13:28:53.410 2016: [I] r10 0x0000000000000008 r11 0x0000000000000202 Tue Jul 26 13:28:53.411 2016: [I] r12 0x00007F6DC9FC5540 r13 0x00007F6DCA05C1C0 Tue Jul 26 13:28:53.412 2016: [I] r14 0x0000000000000000 r15 0x0000000000000000 Tue Jul 26 13:28:53.413 2016: [I] rip 0x00007F6DC7894625 eflags 0x0000000000000202 Tue Jul 26 13:28:53.414 2016: [I] csgsfs 0x0000000000000033 err 0x0000000000000000 Tue Jul 26 13:28:53.415 2016: [I] trapno 0x0000000000000000 oldmsk 0x0000000010017807 Tue Jul 26 13:28:53.416 2016: [I] cr2 0x0000000000000000 Tue Jul 26 13:28:54.225 2016: [D] Traceback: Tue Jul 26 13:28:54.226 2016: [D] 0:00007F6DC7894625 raise + 35 at ??:0 Tue Jul 26 13:28:54.227 2016: [D] 1:00007F6DC7895E05 abort + 175 at ??:0 Tue Jul 26 13:28:54.228 2016: [D] 2:00007F6DC788D74E __assert_fail_base + 11E at ??:0 Tue Jul 26 13:28:54.229 2016: [D] 3:00007F6DC788D810 __assert_fail + 50 at ??:0 Tue Jul 26 13:28:54.230 2016: [D] 4:00007F6DC95444CA logAssertFailed + 2FA at ??:0 Tue Jul 26 13:28:54.231 2016: [D] 5:00007F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 4B4 at ??:0 Tue Jul 26 13:28:54.232 2016: [D] 6:00007F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 91 at ??:0 Tue Jul 26 13:28:54.233 2016: [D] 7:00007F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 346 at ??:0 Tue Jul 26 13:28:54.234 2016: [D] 8:00007F6DC9332494 HandleMBPcache(MBPcacheParms*) + B4 at ??:0 Tue Jul 26 13:28:54.235 2016: [D] 9:00007F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 3C3 at ??:0 Tue Jul 26 13:28:54.236 2016: [D] 10:00007F6DC908BC06 Thread::callBody(Thread*) + 46 at ??:0 Tue Jul 26 13:28:54.237 2016: [D] 11:00007F6DC907A0D2 Thread::callBodyWrapper(Thread*) + A2 at ??:0 Tue Jul 26 13:28:54.238 2016: [D] 12:00007F6DC87A3AA1 start_thread + D1 at ??:0 Tue Jul 26 13:28:54.239 2016: [D] 13:00007F6DC794A93D clone + 6D at ??:0 Tue Jul 26 13:28:54.240 2016: [N] Restarting mmsdrserv Tue Jul 26 13:28:55.535 2016: [N] Signal 6 at location 0x7F6DC790EA7D in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:55.536 2016: [N] mmfsd is shutting down. Tue Jul 26 13:28:55.537 2016: [N] Reason for shutdown: Signal handler entered Tue Jul 26 13:28:55 BST 2016: mmcommon mmfsdown invoked. Subsystem: mmfs Status: active Tue Jul 26 13:28:55 BST 2016: /var/mmfs/etc/mmfsdown invoked umount2: Device or resource busy umount: /camp: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy umount: /ingest: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Shutting down NFS daemon: [ OK ] Shutting down NFS mountd: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Shutting down RPC idmapd: [ OK ] Stopping NFS statd: [ OK ] Ugly, right? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From bbanister at jumptrading.com Wed Jul 27 18:37:37 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 27 Jul 2016 17:37:37 +0000 Subject: [gpfsug-discuss] CCR troubles Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I'll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Jul 27 19:03:05 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 27 Jul 2016 14:03:05 -0400 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org)" Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Jul 27 23:29:19 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 27 Jul 2016 22:29:19 +0000 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> Hi Marc, I do understand the principal you describe. The quorum nodes are accessible over TCP/IP but GPFS happens to be down. I think that CCR should would work regardless of whether GPFS is up or down, so that you can change the configuration on a down cluster. I could even imagine a scenario where a config parameter was set incorrectly and prevents GPFS from starting at all. If you have to have GPFS up to make config changes because of CCR then how can you fix this issue? Thanks for the response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Wednesday, July 27, 2016 1:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister > To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org)" > Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmcpheeters at anl.gov Wed Jul 27 23:34:50 2016 From: gmcpheeters at anl.gov (McPheeters, Gordon) Date: Wed, 27 Jul 2016 22:34:50 +0000 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> mmchcluster has an option: ??ccr?disable Reverts to the traditional primary or backup configuration server semantics and destroys the CCR environment. All nodes must be shut down before disabling CCR. -Gordon On Jul 27, 2016, at 5:29 PM, Bryan Banister > wrote: Hi Marc, I do understand the principal you describe. The quorum nodes are accessible over TCP/IP but GPFS happens to be down. I think that CCR should would work regardless of whether GPFS is up or down, so that you can change the configuration on a down cluster. I could even imagine a scenario where a config parameter was set incorrectly and prevents GPFS from starting at all. If you have to have GPFS up to make config changes because of CCR then how can you fix this issue? Thanks for the response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Wednesday, July 27, 2016 1:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister > To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org)" > Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Jul 27 23:44:27 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 27 Jul 2016 22:44:27 +0000 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> Right, I know that I can disable CCR, and I?m asking if this seemingly broken behavior of GPFS commands when the cluster is down was the expected mode of operation with CCR enabled. Sounds like it from the responses thus far. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of McPheeters, Gordon Sent: Wednesday, July 27, 2016 5:35 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles mmchcluster has an option: ??ccr?disable Reverts to the traditional primary or backup configuration server semantics and destroys the CCR environment. All nodes must be shut down before disabling CCR. -Gordon On Jul 27, 2016, at 5:29 PM, Bryan Banister > wrote: Hi Marc, I do understand the principal you describe. The quorum nodes are accessible over TCP/IP but GPFS happens to be down. I think that CCR should would work regardless of whether GPFS is up or down, so that you can change the configuration on a down cluster. I could even imagine a scenario where a config parameter was set incorrectly and prevents GPFS from starting at all. If you have to have GPFS up to make config changes because of CCR then how can you fix this issue? Thanks for the response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Wednesday, July 27, 2016 1:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister > To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org)" > Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsanjay at us.ibm.com Thu Jul 28 00:04:35 2016 From: gsanjay at us.ibm.com (Sanjay Gandhi) Date: Wed, 27 Jul 2016 16:04:35 -0700 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 54, Issue 63 In-Reply-To: References: Message-ID: Check mmsdrserv is running on all quorum nodes. mmlscluster should start mmsdrserv if it is not running. Thanks, Sanjay Gandhi GPFS FVT IBM, Beaverton Phone/FAX : 503-578-4141 T/L 775-4141 gsanjay at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 07/27/2016 03:44 PM Subject: gpfsug-discuss Digest, Vol 54, Issue 63 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: CCR troubles (Bryan Banister) ---------------------------------------------------------------------- Message: 1 Date: Wed, 27 Jul 2016 22:44:27 +0000 From: Bryan Banister To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D at CHI-EXCHANGEW1.w2k.jumptrading.com> Content-Type: text/plain; charset="utf-8" Right, I know that I can disable CCR, and I?m asking if this seemingly broken behavior of GPFS commands when the cluster is down was the expected mode of operation with CCR enabled. Sounds like it from the responses thus far. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of McPheeters, Gordon Sent: Wednesday, July 27, 2016 5:35 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles mmchcluster has an option: ??ccr?disable Reverts to the traditional primary or backup configuration server semantics and destroys the CCR environment. All nodes must be shut down before disabling CCR. -Gordon On Jul 27, 2016, at 5:29 PM, Bryan Banister > wrote: Hi Marc, I do understand the principal you describe. The quorum nodes are accessible over TCP/IP but GPFS happens to be down. I think that CCR should would work regardless of whether GPFS is up or down, so that you can change the configuration on a down cluster. I could even imagine a scenario where a config parameter was set incorrectly and prevents GPFS from starting at all. If you have to have GPFS up to make config changes because of CCR then how can you fix this issue? Thanks for the response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Wednesday, July 27, 2016 1:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister > To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org< mailto:gpfsug-discuss at spectrumscale.org>)" > Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> ________________________________ When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com< http://fpia-gpfs-jcsdr01.grid.jumptrading.com/>. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160727/ea365c46/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 54, Issue 63 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From olaf.weiser at de.ibm.com Thu Jul 28 05:23:34 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 28 Jul 2016 06:23:34 +0200 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: An HTML attachment was scrubbed... URL: From radhika.p at in.ibm.com Thu Jul 28 06:43:13 2016 From: radhika.p at in.ibm.com (Radhika A Parameswaran) Date: Thu, 28 Jul 2016 11:13:13 +0530 Subject: [gpfsug-discuss] Re. AFM Crashing the MDS In-Reply-To: References: Message-ID: Luke, AFM is not tested for cascading configurations, this is getting added into the documentation for 4.2.1: "Cascading of AFM caches is not tested." Thanks and Regards Radhika From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 07/27/2016 04:30 PM Subject: gpfsug-discuss Digest, Vol 54, Issue 59 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. AFM Crashing the MDS (Luke Raimbach) ---------------------------------------------------------------------- Message: 1 Date: Tue, 26 Jul 2016 14:17:35 +0000 From: Luke Raimbach To: gpfsug main discussion list Subject: [gpfsug-discuss] AFM Crashing the MDS Message-ID: Content-Type: text/plain; charset="utf-8" Hi All, Anyone seen GPFS barf like this before? I'll explain the setup: RO AFM cache on remote site (cache A) for reading remote datasets quickly, LU AFM cache at destination site (cache B) for caching data from cache A (has a local compute cluster mounting this over multi-cluster), IW AFM cache at destination site (cache C) for presenting cache B over NAS protocols, Reading files in cache C should pull data from the remote source through cache A->B->C Modifying files in cache C should pull data into cache B and then break the cache relationship for that file, converting it to a local copy. Those modifications should include metadata updates (e.g. chown). To speed things up we prefetch files into cache B for datasets which are undergoing migration and have entered a read-only state at the source. When issuing chown on a directory in cache C containing ~4.5million files, the MDS for the AFM cache C crashes badly: Tue Jul 26 13:28:52.487 2016: [X] logAssertFailed: addr.isReserved() || addr.getClusterIdx() == clusterIdx Tue Jul 26 13:28:52.488 2016: [X] return code 0, reason code 1, log record tag 0 Tue Jul 26 13:28:53.392 2016: [X] *** Assert exp(addr.isReserved() || addr.getClusterIdx() == clusterIdx) in line 1936 of file /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h Tue Jul 26 13:28:53.393 2016: [E] *** Traceback: Tue Jul 26 13:28:53.394 2016: [E] 2:0x7F6DC95444A6 logAssertFailed + 0x2D6 at ??:0 Tue Jul 26 13:28:53.395 2016: [E] 3:0x7F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 0x4B4 at ??:0 Tue Jul 26 13:28:53.396 2016: [E] 4:0x7F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 0x91 at ??:0 Tue Jul 26 13:28:53.397 2016: [E] 5:0x7F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 0x346 at ??:0 Tue Jul 26 13:28:53.398 2016: [E] 6:0x7F6DC9332494 HandleMBPcache(MBPcacheParms*) + 0xB4 at ??:0 Tue Jul 26 13:28:53.399 2016: [E] 7:0x7F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 0x3C3 at ??:0 Tue Jul 26 13:28:53.400 2016: [E] 8:0x7F6DC908BC06 Thread::callBody(Thread*) + 0x46 at ??:0 Tue Jul 26 13:28:53.401 2016: [E] 9:0x7F6DC907A0D2 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 Tue Jul 26 13:28:53.402 2016: [E] 10:0x7F6DC87A3AA1 start_thread + 0xD1 at ??:0 Tue Jul 26 13:28:53.403 2016: [E] 11:0x7F6DC794A93D clone + 0x6D at ??:0 mmfsd: /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h:1936: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion `addr.isReserved() || addr.getClusterIdx() == clusterIdx' failed. Tue Jul 26 13:28:53.404 2016: [N] Signal 6 at location 0x7F6DC7894625 in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:53.405 2016: [I] rax 0x0000000000000000 rbx 0x00007F6DC8DCB000 Tue Jul 26 13:28:53.406 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx 0x0000000000000006 Tue Jul 26 13:28:53.407 2016: [I] rsp 0x00007F6DAAEA01F8 rbp 0x00007F6DCA05C8B0 Tue Jul 26 13:28:53.408 2016: [I] rsi 0x00000000000018F8 rdi 0x0000000000001876 Tue Jul 26 13:28:53.409 2016: [I] r8 0xFEFEFEFEFEFEFEFF r9 0xFEFEFEFEFF092D63 Tue Jul 26 13:28:53.410 2016: [I] r10 0x0000000000000008 r11 0x0000000000000202 Tue Jul 26 13:28:53.411 2016: [I] r12 0x00007F6DC9FC5540 r13 0x00007F6DCA05C1C0 Tue Jul 26 13:28:53.412 2016: [I] r14 0x0000000000000000 r15 0x0000000000000000 Tue Jul 26 13:28:53.413 2016: [I] rip 0x00007F6DC7894625 eflags 0x0000000000000202 Tue Jul 26 13:28:53.414 2016: [I] csgsfs 0x0000000000000033 err 0x0000000000000000 Tue Jul 26 13:28:53.415 2016: [I] trapno 0x0000000000000000 oldmsk 0x0000000010017807 Tue Jul 26 13:28:53.416 2016: [I] cr2 0x0000000000000000 Tue Jul 26 13:28:54.225 2016: [D] Traceback: Tue Jul 26 13:28:54.226 2016: [D] 0:00007F6DC7894625 raise + 35 at ??:0 Tue Jul 26 13:28:54.227 2016: [D] 1:00007F6DC7895E05 abort + 175 at ??:0 Tue Jul 26 13:28:54.228 2016: [D] 2:00007F6DC788D74E __assert_fail_base + 11E at ??:0 Tue Jul 26 13:28:54.229 2016: [D] 3:00007F6DC788D810 __assert_fail + 50 at ??:0 Tue Jul 26 13:28:54.230 2016: [D] 4:00007F6DC95444CA logAssertFailed + 2FA at ??:0 Tue Jul 26 13:28:54.231 2016: [D] 5:00007F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 4B4 at ??:0 Tue Jul 26 13:28:54.232 2016: [D] 6:00007F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 91 at ??:0 Tue Jul 26 13:28:54.233 2016: [D] 7:00007F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 346 at ??:0 Tue Jul 26 13:28:54.234 2016: [D] 8:00007F6DC9332494 HandleMBPcache(MBPcacheParms*) + B4 at ??:0 Tue Jul 26 13:28:54.235 2016: [D] 9:00007F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 3C3 at ??:0 Tue Jul 26 13:28:54.236 2016: [D] 10:00007F6DC908BC06 Thread::callBody(Thread*) + 46 at ??:0 Tue Jul 26 13:28:54.237 2016: [D] 11:00007F6DC907A0D2 Thread::callBodyWrapper(Thread*) + A2 at ??:0 Tue Jul 26 13:28:54.238 2016: [D] 12:00007F6DC87A3AA1 start_thread + D1 at ??:0 Tue Jul 26 13:28:54.239 2016: [D] 13:00007F6DC794A93D clone + 6D at ??:0 Tue Jul 26 13:28:54.240 2016: [N] Restarting mmsdrserv Tue Jul 26 13:28:55.535 2016: [N] Signal 6 at location 0x7F6DC790EA7D in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:55.536 2016: [N] mmfsd is shutting down. Tue Jul 26 13:28:55.537 2016: [N] Reason for shutdown: Signal handler entered Tue Jul 26 13:28:55 BST 2016: mmcommon mmfsdown invoked. Subsystem: mmfs Status: active Tue Jul 26 13:28:55 BST 2016: /var/mmfs/etc/mmfsdown invoked umount2: Device or resource busy umount: /camp: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy umount: /ingest: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Shutting down NFS daemon: [ OK ] Shutting down NFS mountd: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Shutting down RPC idmapd: [ OK ] Stopping NFS statd: [ OK ] Ugly, right? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 54, Issue 59 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Thu Jul 28 09:30:59 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Thu, 28 Jul 2016 08:30:59 +0000 Subject: [gpfsug-discuss] AFM Crashing the MDS Message-ID: Dear Radhika, In the early days of AFM and at two separate GPFS UK User Group meetings, I discussed AFM cache chaining with IBM technical people plus at least one developer. My distinct recollection of the outcome was that cache chaining was supported. Nevertheless, the difference between what my memory tells me and what is being reported now is irrelevant. We are stuck with large volumes of data being migrated in this fashion, so there is clearly a customer use case for chaining AFM caches. It would be much more helpful if IBM could take on this case and look at the suspected bug that's been chased out here. Real world observation in the field is that queuing large numbers of metadata updates on the MDS itself causes this crash, whereas issuing the updates from another node in the cache cluster adds to the MDS queue and the crash does not happen. My guess is that there is a bug whereby daemon-local additions to the MDS queue aren't handled correctly (further speculation is that there is a memory leak for local MDS operations, but that needs more testing which I don't have time for - perhaps IBM could try it out?); however, when a metadata update operation is sent through an RPC from another node, it is added to the queue and handled correctly. A workaround, if you will. Other minor observations here are that the further down the chain of caches you are, the larger you should set afmDisconnectTimeout as any intermediate cache recovery time needs to be taken into account following a disconnect event. Initially, this was slightly counterintuitive because caches B and C as described below are connected over multiple IB interfaces and shouldn't disconnect except when there's some other failure. Conversely, the connection between cache A and B is over a very flaky wide area network and although we've managed to tune out a lot of the problems introduced by high and variable latency, the view of cache A from cache B's perspective still sometimes gets suspended. The failure observed above doesn't really feel like it's an artefact of cascading caches, but a bug in MDS code as described. Sharing background information about the cascading cache setup was in the spirit of the mailing list and might have led IBM or other customers attempting this kind of setup to share some of their experiences. Hope you can help. Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Radhika A Parameswaran Sent: 28 July 2016 06:43 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Re. AFM Crashing the MDS Luke, AFM is not tested for cascading configurations, this is getting added into the documentation for 4.2.1: "Cascading of AFM caches is not tested." Thanks and Regards Radhika From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 07/27/2016 04:30 PM Subject: gpfsug-discuss Digest, Vol 54, Issue 59 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. AFM Crashing the MDS (Luke Raimbach) ---------------------------------------------------------------------- Message: 1 Date: Tue, 26 Jul 2016 14:17:35 +0000 From: Luke Raimbach > To: gpfsug main discussion list > Subject: [gpfsug-discuss] AFM Crashing the MDS Message-ID: > Content-Type: text/plain; charset="utf-8" Hi All, Anyone seen GPFS barf like this before? I'll explain the setup: RO AFM cache on remote site (cache A) for reading remote datasets quickly, LU AFM cache at destination site (cache B) for caching data from cache A (has a local compute cluster mounting this over multi-cluster), IW AFM cache at destination site (cache C) for presenting cache B over NAS protocols, Reading files in cache C should pull data from the remote source through cache A->B->C Modifying files in cache C should pull data into cache B and then break the cache relationship for that file, converting it to a local copy. Those modifications should include metadata updates (e.g. chown). To speed things up we prefetch files into cache B for datasets which are undergoing migration and have entered a read-only state at the source. When issuing chown on a directory in cache C containing ~4.5million files, the MDS for the AFM cache C crashes badly: Tue Jul 26 13:28:52.487 2016: [X] logAssertFailed: addr.isReserved() || addr.getClusterIdx() == clusterIdx Tue Jul 26 13:28:52.488 2016: [X] return code 0, reason code 1, log record tag 0 Tue Jul 26 13:28:53.392 2016: [X] *** Assert exp(addr.isReserved() || addr.getClusterIdx() == clusterIdx) in line 1936 of file /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h Tue Jul 26 13:28:53.393 2016: [E] *** Traceback: Tue Jul 26 13:28:53.394 2016: [E] 2:0x7F6DC95444A6 logAssertFailed + 0x2D6 at ??:0 Tue Jul 26 13:28:53.395 2016: [E] 3:0x7F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 0x4B4 at ??:0 Tue Jul 26 13:28:53.396 2016: [E] 4:0x7F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 0x91 at ??:0 Tue Jul 26 13:28:53.397 2016: [E] 5:0x7F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 0x346 at ??:0 Tue Jul 26 13:28:53.398 2016: [E] 6:0x7F6DC9332494 HandleMBPcache(MBPcacheParms*) + 0xB4 at ??:0 Tue Jul 26 13:28:53.399 2016: [E] 7:0x7F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 0x3C3 at ??:0 Tue Jul 26 13:28:53.400 2016: [E] 8:0x7F6DC908BC06 Thread::callBody(Thread*) + 0x46 at ??:0 Tue Jul 26 13:28:53.401 2016: [E] 9:0x7F6DC907A0D2 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 Tue Jul 26 13:28:53.402 2016: [E] 10:0x7F6DC87A3AA1 start_thread + 0xD1 at ??:0 Tue Jul 26 13:28:53.403 2016: [E] 11:0x7F6DC794A93D clone + 0x6D at ??:0 mmfsd: /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h:1936: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion `addr.isReserved() || addr.getClusterIdx() == clusterIdx' failed. Tue Jul 26 13:28:53.404 2016: [N] Signal 6 at location 0x7F6DC7894625 in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:53.405 2016: [I] rax 0x0000000000000000 rbx 0x00007F6DC8DCB000 Tue Jul 26 13:28:53.406 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx 0x0000000000000006 Tue Jul 26 13:28:53.407 2016: [I] rsp 0x00007F6DAAEA01F8 rbp 0x00007F6DCA05C8B0 Tue Jul 26 13:28:53.408 2016: [I] rsi 0x00000000000018F8 rdi 0x0000000000001876 Tue Jul 26 13:28:53.409 2016: [I] r8 0xFEFEFEFEFEFEFEFF r9 0xFEFEFEFEFF092D63 Tue Jul 26 13:28:53.410 2016: [I] r10 0x0000000000000008 r11 0x0000000000000202 Tue Jul 26 13:28:53.411 2016: [I] r12 0x00007F6DC9FC5540 r13 0x00007F6DCA05C1C0 Tue Jul 26 13:28:53.412 2016: [I] r14 0x0000000000000000 r15 0x0000000000000000 Tue Jul 26 13:28:53.413 2016: [I] rip 0x00007F6DC7894625 eflags 0x0000000000000202 Tue Jul 26 13:28:53.414 2016: [I] csgsfs 0x0000000000000033 err 0x0000000000000000 Tue Jul 26 13:28:53.415 2016: [I] trapno 0x0000000000000000 oldmsk 0x0000000010017807 Tue Jul 26 13:28:53.416 2016: [I] cr2 0x0000000000000000 Tue Jul 26 13:28:54.225 2016: [D] Traceback: Tue Jul 26 13:28:54.226 2016: [D] 0:00007F6DC7894625 raise + 35 at ??:0 Tue Jul 26 13:28:54.227 2016: [D] 1:00007F6DC7895E05 abort + 175 at ??:0 Tue Jul 26 13:28:54.228 2016: [D] 2:00007F6DC788D74E __assert_fail_base + 11E at ??:0 Tue Jul 26 13:28:54.229 2016: [D] 3:00007F6DC788D810 __assert_fail + 50 at ??:0 Tue Jul 26 13:28:54.230 2016: [D] 4:00007F6DC95444CA logAssertFailed + 2FA at ??:0 Tue Jul 26 13:28:54.231 2016: [D] 5:00007F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 4B4 at ??:0 Tue Jul 26 13:28:54.232 2016: [D] 6:00007F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 91 at ??:0 Tue Jul 26 13:28:54.233 2016: [D] 7:00007F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 346 at ??:0 Tue Jul 26 13:28:54.234 2016: [D] 8:00007F6DC9332494 HandleMBPcache(MBPcacheParms*) + B4 at ??:0 Tue Jul 26 13:28:54.235 2016: [D] 9:00007F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 3C3 at ??:0 Tue Jul 26 13:28:54.236 2016: [D] 10:00007F6DC908BC06 Thread::callBody(Thread*) + 46 at ??:0 Tue Jul 26 13:28:54.237 2016: [D] 11:00007F6DC907A0D2 Thread::callBodyWrapper(Thread*) + A2 at ??:0 Tue Jul 26 13:28:54.238 2016: [D] 12:00007F6DC87A3AA1 start_thread + D1 at ??:0 Tue Jul 26 13:28:54.239 2016: [D] 13:00007F6DC794A93D clone + 6D at ??:0 Tue Jul 26 13:28:54.240 2016: [N] Restarting mmsdrserv Tue Jul 26 13:28:55.535 2016: [N] Signal 6 at location 0x7F6DC790EA7D in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:55.536 2016: [N] mmfsd is shutting down. Tue Jul 26 13:28:55.537 2016: [N] Reason for shutdown: Signal handler entered Tue Jul 26 13:28:55 BST 2016: mmcommon mmfsdown invoked. Subsystem: mmfs Status: active Tue Jul 26 13:28:55 BST 2016: /var/mmfs/etc/mmfsdown invoked umount2: Device or resource busy umount: /camp: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy umount: /ingest: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Shutting down NFS daemon: [ OK ] Shutting down NFS mountd: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Shutting down RPC idmapd: [ OK ] Stopping NFS statd: [ OK ] Ugly, right? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 54, Issue 59 ********************************************** The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From radhika.p at in.ibm.com Thu Jul 28 16:04:44 2016 From: radhika.p at in.ibm.com (Radhika A Parameswaran) Date: Thu, 28 Jul 2016 20:34:44 +0530 Subject: [gpfsug-discuss] AFM Crashing the MDS In-Reply-To: References: Message-ID: Hi Luke, We are explicitly adding cascading to the 4.2.1 documentation as not tested, as we saw few issues during our in-house testing and the tests are not complete. With specific to this use case, we can give it a try and get back to your personal id. Thanks and Regards Radhika -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Thu Jul 28 16:39:09 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Thu, 28 Jul 2016 11:39:09 -0400 Subject: [gpfsug-discuss] GPFS on Broadwell processor Message-ID: All, Is there anything special (BIOS option / kernel option) that needs to be done when running GPFS on a Broadwell powered NSD server? Thank you, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamiedavis at us.ibm.com Thu Jul 28 16:48:06 2016 From: jamiedavis at us.ibm.com (James Davis) Date: Thu, 28 Jul 2016 15:48:06 +0000 Subject: [gpfsug-discuss] GPFS on Broadwell processor In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 28 18:24:52 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 28 Jul 2016 13:24:52 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: Based on experiments on my test cluster, I can assure you that you can list and change GPFS configuration parameters with CCR enabled while GPFS is down. I understand you are having a problem with your cluster, but you are incorrectly disparaging the CCR. In fact you can mmshutdown -a AND kill all GPFS related processes, including mmsdrserv and mmcrmonitor and then issue commands like: mmlscluster, mmlsconfig, mmchconfig Those will work correctly and by-the-way re-start mmsdrserv and mmcrmonitor... (Use command like `ps auxw | grep mm` to find the relevenat processes). But that will not start the main GPFS file manager process mmfsd. GPFS "proper" remains down... For the following commands Linux was "up" on all nodes, but GPFS was shutdown. [root at n2 gpfs-git]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 down 4 n5 down 6 n3 down However if a majority of the quorum nodes can not be obtained, you WILL see a sequence of messages like this, after a noticeable "timeout": (For the following test I had three quorum nodes and did a Linux shutdown on two of them...) [root at n2 gpfs-git]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. mmchconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmgetstate -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it should check! Then re-starting Linux... So I have two of three quorum nodes active, but GPFS still down... ## From n2, login to node n3 that I just rebooted... [root at n2 gpfs-git]# ssh n3 Last login: Thu Jul 28 09:50:53 2016 from n2.frozen ## See if any mm processes are running? ... NOPE! [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep --color=auto mm ## Check the state... notice n4 is powered off... [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Examine the cluster configuration [root at n3 ~]# mmlscluster mmlscluster GPFS cluster information ======================== GPFS cluster name: madagascar.frozen GPFS cluster id: 7399668614468035547 GPFS UID domain: madagascar.frozen Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR GPFS cluster configuration servers: ----------------------------------- Primary server: n2.frozen (not in use) Secondary server: n4.frozen (not in use) Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------- 1 n2.frozen 172.20.0.21 n2.frozen quorum-manager-perfmon 3 n4.frozen 172.20.0.23 n4.frozen quorum-manager-perfmon 4 n5.frozen 172.20.0.24 n5.frozen perfmon 6 n3.frozen 172.20.0.22 n3.frozen quorum-manager-perfmon ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep --color=auto mm ## Now I can mmchconfig ... while GPFS remains down. [root at n3 ~]# mmchconfig worker1Threads=1022 mmchconfig worker1Threads=1022 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation started Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation completed; mmdsh rc=0 [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. [root at n3 ~]# ping -c 1 n4 ping -c 1 n4 PING n4.frozen (172.20.0.23) 56(84) bytes of data. >From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable --- n4.frozen ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms [root at n3 ~]# exit exit logout Connection to n3 closed. [root at n2 gpfs-git]# ps auwx | grep mm root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep --color=auto mm root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py [root at n2 gpfs-git]# -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jul 28 18:57:53 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 28 Jul 2016 17:57:53 +0000 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> I now see that these mmccrmonitor and mmsdrserv daemons are required for the CCR operations to work. This is just not clear in the error output. Even the GPFS 4.2 Problem Determination Guide doesn't have anything explaining the "Not enough CCR quorum nodes available" or "Unexpected error from ccr fget mmsdrfs" error messages. Thus there is no clear direction on how to fix this issue from the command output, the man pages, nor the Admin Guides. [root at fpia-gpfs-jcsdr01 ~]# man -E ascii mmccr No manual entry for mmccr There isn't a help for mmccr either, but at least it does print some usage info: [root at fpia-gpfs-jcsdr01 ~]# mmccr -h Unknown subcommand: '-h'Usage: mmccr subcommand common-options subcommand-options... Subcommands: Setup and Initialization: [snip] I'm still not sure how to start these mmccrmonitor and mmsdrserv daemons without starting GPFS... could you tell me how it would be possible? Thanks for sharing details about how this all works Marc, I do appreciate your response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, July 28, 2016 12:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown Based on experiments on my test cluster, I can assure you that you can list and change GPFS configuration parameters with CCR enabled while GPFS is down. I understand you are having a problem with your cluster, but you are incorrectly disparaging the CCR. In fact you can mmshutdown -a AND kill all GPFS related processes, including mmsdrserv and mmcrmonitor and then issue commands like: mmlscluster, mmlsconfig, mmchconfig Those will work correctly and by-the-way re-start mmsdrserv and mmcrmonitor... (Use command like `ps auxw | grep mm` to find the relevenat processes). But that will not start the main GPFS file manager process mmfsd. GPFS "proper" remains down... For the following commands Linux was "up" on all nodes, but GPFS was shutdown. [root at n2 gpfs-git]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 down 4 n5 down 6 n3 down However if a majority of the quorum nodes can not be obtained, you WILL see a sequence of messages like this, after a noticeable "timeout": (For the following test I had three quorum nodes and did a Linux shutdown on two of them...) [root at n2 gpfs-git]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. mmchconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmgetstate -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it should check! Then re-starting Linux... So I have two of three quorum nodes active, but GPFS still down... ## From n2, login to node n3 that I just rebooted... [root at n2 gpfs-git]# ssh n3 Last login: Thu Jul 28 09:50:53 2016 from n2.frozen ## See if any mm processes are running? ... NOPE! [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep --color=auto mm ## Check the state... notice n4 is powered off... [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Examine the cluster configuration [root at n3 ~]# mmlscluster mmlscluster GPFS cluster information ======================== GPFS cluster name: madagascar.frozen GPFS cluster id: 7399668614468035547 GPFS UID domain: madagascar.frozen Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR GPFS cluster configuration servers: ----------------------------------- Primary server: n2.frozen (not in use) Secondary server: n4.frozen (not in use) Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------- 1 n2.frozen 172.20.0.21 n2.frozen quorum-manager-perfmon 3 n4.frozen 172.20.0.23 n4.frozen quorum-manager-perfmon 4 n5.frozen 172.20.0.24 n5.frozen perfmon 6 n3.frozen 172.20.0.22 n3.frozen quorum-manager-perfmon ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep --color=auto mm ## Now I can mmchconfig ... while GPFS remains down. [root at n3 ~]# mmchconfig worker1Threads=1022 mmchconfig worker1Threads=1022 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation started Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation completed; mmdsh rc=0 [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. [root at n3 ~]# ping -c 1 n4 ping -c 1 n4 PING n4.frozen (172.20.0.23) 56(84) bytes of data. >From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable --- n4.frozen ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms [root at n3 ~]# exit exit logout Connection to n3 closed. [root at n2 gpfs-git]# ps auwx | grep mm root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep --color=auto mm root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py [root at n2 gpfs-git]# ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jul 28 19:14:05 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 28 Jul 2016 18:14:05 +0000 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> Hi Marc, So this issue is actually caused by our Systemd setup. We have fully converted over to Systemd to manage the dependency chain needed for GPFS to start properly and also our scheduling system after that. The issue is that when we shutdown GPFS with Systemd this apparently is causing the mmsdrserv and mmccrmonitor processes to also be killed/term'd, probably because these are started in the same CGROUP as GPFS and Systemd kills all processes in this CGROUP when GPFS is stopped. Not sure how to proceed with safeguarding these daemons from Systemd... and real Systemd support in GPFS is basically non-existent at this point. So my problem is actually a Systemd problem, not a CCR problem! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan Banister Sent: Thursday, July 28, 2016 12:58 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown I now see that these mmccrmonitor and mmsdrserv daemons are required for the CCR operations to work. This is just not clear in the error output. Even the GPFS 4.2 Problem Determination Guide doesn't have anything explaining the "Not enough CCR quorum nodes available" or "Unexpected error from ccr fget mmsdrfs" error messages. Thus there is no clear direction on how to fix this issue from the command output, the man pages, nor the Admin Guides. [root at fpia-gpfs-jcsdr01 ~]# man -E ascii mmccr No manual entry for mmccr There isn't a help for mmccr either, but at least it does print some usage info: [root at fpia-gpfs-jcsdr01 ~]# mmccr -h Unknown subcommand: '-h'Usage: mmccr subcommand common-options subcommand-options... Subcommands: Setup and Initialization: [snip] I'm still not sure how to start these mmccrmonitor and mmsdrserv daemons without starting GPFS... could you tell me how it would be possible? Thanks for sharing details about how this all works Marc, I do appreciate your response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, July 28, 2016 12:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown Based on experiments on my test cluster, I can assure you that you can list and change GPFS configuration parameters with CCR enabled while GPFS is down. I understand you are having a problem with your cluster, but you are incorrectly disparaging the CCR. In fact you can mmshutdown -a AND kill all GPFS related processes, including mmsdrserv and mmcrmonitor and then issue commands like: mmlscluster, mmlsconfig, mmchconfig Those will work correctly and by-the-way re-start mmsdrserv and mmcrmonitor... (Use command like `ps auxw | grep mm` to find the relevenat processes). But that will not start the main GPFS file manager process mmfsd. GPFS "proper" remains down... For the following commands Linux was "up" on all nodes, but GPFS was shutdown. [root at n2 gpfs-git]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 down 4 n5 down 6 n3 down However if a majority of the quorum nodes can not be obtained, you WILL see a sequence of messages like this, after a noticeable "timeout": (For the following test I had three quorum nodes and did a Linux shutdown on two of them...) [root at n2 gpfs-git]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. mmchconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmgetstate -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it should check! Then re-starting Linux... So I have two of three quorum nodes active, but GPFS still down... ## From n2, login to node n3 that I just rebooted... [root at n2 gpfs-git]# ssh n3 Last login: Thu Jul 28 09:50:53 2016 from n2.frozen ## See if any mm processes are running? ... NOPE! [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep --color=auto mm ## Check the state... notice n4 is powered off... [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Examine the cluster configuration [root at n3 ~]# mmlscluster mmlscluster GPFS cluster information ======================== GPFS cluster name: madagascar.frozen GPFS cluster id: 7399668614468035547 GPFS UID domain: madagascar.frozen Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR GPFS cluster configuration servers: ----------------------------------- Primary server: n2.frozen (not in use) Secondary server: n4.frozen (not in use) Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------- 1 n2.frozen 172.20.0.21 n2.frozen quorum-manager-perfmon 3 n4.frozen 172.20.0.23 n4.frozen quorum-manager-perfmon 4 n5.frozen 172.20.0.24 n5.frozen perfmon 6 n3.frozen 172.20.0.22 n3.frozen quorum-manager-perfmon ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep --color=auto mm ## Now I can mmchconfig ... while GPFS remains down. [root at n3 ~]# mmchconfig worker1Threads=1022 mmchconfig worker1Threads=1022 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation started Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation completed; mmdsh rc=0 [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. [root at n3 ~]# ping -c 1 n4 ping -c 1 n4 PING n4.frozen (172.20.0.23) 56(84) bytes of data. >From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable --- n4.frozen ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms [root at n3 ~]# exit exit logout Connection to n3 closed. [root at n2 gpfs-git]# ps auwx | grep mm root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep --color=auto mm root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py [root at n2 gpfs-git]# ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 28 19:23:49 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 28 Jul 2016 14:23:49 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: I think the idea is that you should not need to know the details of how ccr and sdrserv are implemented nor how they work. At this moment, I don't! Literally, I just installed GPFS and defined my system with mmcrcluster and so forth and "it just works". As I wrote, just running mmlscluster or mmlsconfig or similar configuration create, list, change, delete commands should start up ccr and sdrserv under the covers. Okay, now "I hear you" -- it ain't working for you today. Presumably it did a while ago? Let's think about that... Troubleshooting 0,1,2 in order of suspicion... 0. Check that you can ping and ssh from each quorum node to every other quorum node. Q*(Q-1) tests 1. Check that you have plenty of free space in /var on each quorum node. Hmmm... we're not talking huge, but see if /var/mmfs/tmp is filled with junk.... Before and After clearing most of that out I had and have: [root at bog-wifi ~]# du -shk /var/mmfs 84532 /var/mmfs ## clean all big and old files out of /var/mmfs/tmp [root at bog-wifi ~]# du -shk /var/mmfs 9004 /var/mmfs Because we know that /var/mmfs is where GPFS store configuration "stuff" - 2. Check that we have GPFS software correctly installed on each quorum node: rpm -qa gpfs.* | xargs rpm --verify From: Bryan Banister To: gpfsug main discussion list Date: 07/28/2016 01:58 PM Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown Sent by: gpfsug-discuss-bounces at spectrumscale.org I now see that these mmccrmonitor and mmsdrserv daemons are required for the CCR operations to work. This is just not clear in the error output. Even the GPFS 4.2 Problem Determination Guide doesn?t have anything explaining the ?Not enough CCR quorum nodes available? or ?Unexpected error from ccr fget mmsdrfs? error messages. Thus there is no clear direction on how to fix this issue from the command output, the man pages, nor the Admin Guides. [root at fpia-gpfs-jcsdr01 ~]# man -E ascii mmccr No manual entry for mmccr There isn?t a help for mmccr either, but at least it does print some usage info: [root at fpia-gpfs-jcsdr01 ~]# mmccr -h Unknown subcommand: '-h'Usage: mmccr subcommand common-options subcommand-options... Subcommands: Setup and Initialization: [snip] I?m still not sure how to start these mmccrmonitor and mmsdrserv daemons without starting GPFS? could you tell me how it would be possible? Thanks for sharing details about how this all works Marc, I do appreciate your response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, July 28, 2016 12:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown Based on experiments on my test cluster, I can assure you that you can list and change GPFS configuration parameters with CCR enabled while GPFS is down. I understand you are having a problem with your cluster, but you are incorrectly disparaging the CCR. In fact you can mmshutdown -a AND kill all GPFS related processes, including mmsdrserv and mmcrmonitor and then issue commands like: mmlscluster, mmlsconfig, mmchconfig Those will work correctly and by-the-way re-start mmsdrserv and mmcrmonitor... (Use command like `ps auxw | grep mm` to find the relevenat processes). But that will not start the main GPFS file manager process mmfsd. GPFS "proper" remains down... For the following commands Linux was "up" on all nodes, but GPFS was shutdown. [root at n2 gpfs-git]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 down 4 n5 down 6 n3 down However if a majority of the quorum nodes can not be obtained, you WILL see a sequence of messages like this, after a noticeable "timeout": (For the following test I had three quorum nodes and did a Linux shutdown on two of them...) [root at n2 gpfs-git]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. mmchconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmgetstate -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it should check! Then re-starting Linux... So I have two of three quorum nodes active, but GPFS still down... ## From n2, login to node n3 that I just rebooted... [root at n2 gpfs-git]# ssh n3 Last login: Thu Jul 28 09:50:53 2016 from n2.frozen ## See if any mm processes are running? ... NOPE! [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep --color=auto mm ## Check the state... notice n4 is powered off... [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Examine the cluster configuration [root at n3 ~]# mmlscluster mmlscluster GPFS cluster information ======================== GPFS cluster name: madagascar.frozen GPFS cluster id: 7399668614468035547 GPFS UID domain: madagascar.frozen Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR GPFS cluster configuration servers: ----------------------------------- Primary server: n2.frozen (not in use) Secondary server: n4.frozen (not in use) Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------- 1 n2.frozen 172.20.0.21 n2.frozen quorum-manager-perfmon 3 n4.frozen 172.20.0.23 n4.frozen quorum-manager-perfmon 4 n5.frozen 172.20.0.24 n5.frozen perfmon 6 n3.frozen 172.20.0.22 n3.frozen quorum-manager-perfmon ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep --color=auto mm ## Now I can mmchconfig ... while GPFS remains down. [root at n3 ~]# mmchconfig worker1Threads=1022 mmchconfig worker1Threads=1022 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation started Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation completed; mmdsh rc=0 [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. [root at n3 ~]# ping -c 1 n4 ping -c 1 n4 PING n4.frozen (172.20.0.23) 56(84) bytes of data. From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable --- n4.frozen ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms [root at n3 ~]# exit exit logout Connection to n3 closed. [root at n2 gpfs-git]# ps auwx | grep mm root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep --color=auto mm root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py [root at n2 gpfs-git]# Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From oehmes at gmail.com Thu Jul 28 19:27:20 2016 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 28 Jul 2016 11:27:20 -0700 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: they should get started as soon as you shutdown via mmshutdown could you check a node where the processes are NOT started and simply run mmshutdown on this node to see if they get started ? On Thu, Jul 28, 2016 at 10:57 AM, Bryan Banister wrote: > I now see that these mmccrmonitor and mmsdrserv daemons are required for > the CCR operations to work. This is just not clear in the error output. > Even the GPFS 4.2 Problem Determination Guide doesn?t have anything > explaining the ?Not enough CCR quorum nodes available? or ?Unexpected error > from ccr fget mmsdrfs? error messages. Thus there is no clear direction on > how to fix this issue from the command output, the man pages, nor the Admin > Guides. > > > > [root at fpia-gpfs-jcsdr01 ~]# man -E ascii mmccr > > No manual entry for mmccr > > > > There isn?t a help for mmccr either, but at least it does print some usage > info: > > > > [root at fpia-gpfs-jcsdr01 ~]# mmccr -h > > Unknown subcommand: '-h'Usage: mmccr subcommand common-options > subcommand-options... > > > > Subcommands: > > > > Setup and Initialization: > > [snip] > > > > I?m still not sure how to start these mmccrmonitor and mmsdrserv daemons > without starting GPFS? could you tell me how it would be possible? > > > > Thanks for sharing details about how this all works Marc, I do appreciate > your response! > > -Bryan > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *On Behalf Of *Marc A Kaplan > *Sent:* Thursday, July 28, 2016 12:25 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig > commands fine with mmshutdown > > > > Based on experiments on my test cluster, I can assure you that you can > list and change GPFS configuration parameters with CCR enabled while GPFS > is down. > > I understand you are having a problem with your cluster, but you are > incorrectly disparaging the CCR. > > In fact you can mmshutdown -a AND kill all GPFS related processes, > including mmsdrserv and mmcrmonitor and then issue commands like: > > mmlscluster, mmlsconfig, mmchconfig > > Those will work correctly and by-the-way re-start mmsdrserv and > mmcrmonitor... > (Use command like `ps auxw | grep mm` to find the relevenat processes). > > But that will not start the main GPFS file manager process mmfsd. GPFS > "proper" remains down... > > For the following commands Linux was "up" on all nodes, but GPFS was > shutdown. > [root at n2 gpfs-git]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 n2 down > 3 n4 down > 4 n5 down > 6 n3 down > > However if a majority of the quorum nodes can not be obtained, you WILL > see a sequence of messages like this, after a noticeable "timeout": > (For the following test I had three quorum nodes and did a Linux shutdown > on two of them...) > > [root at n2 gpfs-git]# mmlsconfig > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmlsconfig: Command failed. Examine previous error messages to determine > cause. > > [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 > mmchconfig: Unable to obtain the GPFS configuration file lock. > mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. > mmchconfig: Command failed. Examine previous error messages to determine > cause. > > [root at n2 gpfs-git]# mmgetstate -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmgetstate: Command failed. Examine previous error messages to determine > cause. > > HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it > should check! > > Then re-starting Linux... So I have two of three quorum nodes active, but > GPFS still down... > > ## From n2, login to node n3 that I just rebooted... > [root at n2 gpfs-git]# ssh n3 > Last login: Thu Jul 28 09:50:53 2016 from n2.frozen > > ## See if any mm processes are running? ... NOPE! > > [root at n3 ~]# ps auxw | grep mm > ps auxw | grep mm > root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep > --color=auto mm > > ## Check the state... notice n4 is powered off... > [root at n3 ~]# mmgetstate -a > mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 n2 down > 3 n4 unknown > 4 n5 down > 6 n3 down > > ## Examine the cluster configuration > [root at n3 ~]# mmlscluster > mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: madagascar.frozen > GPFS cluster id: 7399668614468035547 > GPFS UID domain: madagascar.frozen > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: n2.frozen (not in use) > Secondary server: n4.frozen (not in use) > > Node Daemon node name IP address Admin node name Designation > ------------------------------------------------------------------- > 1 n2.frozen 172.20.0.21 n2.frozen > quorum-manager-perfmon > 3 n4.frozen 172.20.0.23 n4.frozen > quorum-manager-perfmon > 4 n5.frozen 172.20.0.24 n5.frozen perfmon > 6 n3.frozen 172.20.0.22 n3.frozen > quorum-manager-perfmon > > ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd > > [root at n3 ~]# ps auxw | grep mm > ps auxw | grep mm > root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 > /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes > root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep > --color=auto mm > > ## Now I can mmchconfig ... while GPFS remains down. > > [root at n3 ~]# mmchconfig worker1Threads=1022 > mmchconfig worker1Threads=1022 > mmchconfig: Command successfully completed > mmchconfig: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: > mmsdrfs propagation started > Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation > completed; mmdsh rc=0 > > [root at n3 ~]# mmgetstate -a > mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 n2 down > 3 n4 unknown > 4 n5 down > 6 n3 down > > ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. > [root at n3 ~]# ping -c 1 n4 > ping -c 1 n4 > PING n4.frozen (172.20.0.23) 56(84) bytes of data. > From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable > > --- n4.frozen ping statistics --- > 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms > > [root at n3 ~]# exit > exit > logout > Connection to n3 closed. > [root at n2 gpfs-git]# ps auwx | grep mm > root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep > --color=auto mm > root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 > /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 > root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python > /usr/lpp/mmfs/bin/mmsysmon.py > [root at n2 gpfs-git]# > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 28 19:39:48 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 28 Jul 2016 14:39:48 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR vs systemd In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: My experiments show that any of the mmXXX commands that require ccr will start ccr and sdrserv. So unless you have a daeamon actively seeking and killing ccr, I don't see why systemd is a problem. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jul 28 19:44:28 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 28 Jul 2016 18:44:28 +0000 Subject: [gpfsug-discuss] CCR troubles - CCR vs systemd In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB0629615F@CHI-EXCHANGEW1.w2k.jumptrading.com> Yeah, not sure why yet but when I shutdown the cluster using our Systemd configuration this kills the daemons, but mmshutdown obviously doesn't. I'll dig into my problems with that. Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, July 28, 2016 1:40 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR vs systemd My experiments show that any of the mmXXX commands that require ccr will start ccr and sdrserv. So unless you have a daeamon actively seeking and killing ccr, I don't see why systemd is a problem. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 28 21:16:30 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 28 Jul 2016 16:16:30 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR vs systemd In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB0629615F@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629615F@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: Allow me to restate and demonstrate: Even if systemd or any explicit kill signals destroy any/all running mmcr* and mmsdr* processes, simply running mmlsconfig will fire up new mmcr* and mmsdr* processes. For example: ## I used kill -9 to kill all mmccr, mmsdr, lxtrace, ... processes [root at n2 gpfs-git]# ps auwx | grep mm root 9891 0.0 0.0 112640 980 pts/1 S+ 12:57 0:00 grep --color=auto mm [root at n2 gpfs-git]# mmlsconfig Configuration data for cluster madagascar.frozen: ------------------------------------------------- clusterName madagascar.frozen ... worker1Threads 1022 adminMode central File systems in cluster madagascar.frozen: ------------------------------------------ /dev/mak /dev/x1 /dev/yy /dev/zz ## mmlsconfig "needs" ccr and sdrserv, so if it doesn't see them, it restarts them! [root at n2 gpfs-git]# ps auwx | grep mm root 9929 0.0 0.0 114376 1696 pts/1 S 12:58 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 10110 0.0 0.0 20536 128 ? Ss 12:58 0:00 /usr/lpp/mmfs/bin/lxtrace-3.10.0-123.el7.x86_64 on /tmp/mmfs/lxtrac root 10125 0.0 0.0 493264 11064 ? Ssl 12:58 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 10358 0.0 0.0 1700488 17636 ? Sl 12:58 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py root 10440 0.0 0.0 114376 804 pts/1 S 12:59 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 10442 0.0 0.0 112640 976 pts/1 S+ 12:59 0:00 grep --color=auto mm -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Jul 28 22:29:22 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 28 Jul 2016 17:29:22 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR vs systemd In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629615F@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <83afcc2a-699a-d0b8-4f89-5e9dd7d3370e@nasa.gov> Hi Marc, I've seen systemd be overly helpful (read: not at all helpful) when it observes state changing outside of its control. There was a bug I encountered with GPFS (although the real issue may have been systemd, but the fix was put into GPFS) by which GPFS filesystems would get unmounted a split second after they were mounted, by systemd. The fs would mount but systemd decided the /dev/$fs device wasn't "ready" so it helpfully unmounted the filesystem. I don't know much about systemd (avoiding it) but based on my experience with it I could certainly see a case where systemd may actively kill the sdrserv process shortly after it's started by the mm* commands if systemd doesn't expect it to be running. I'd be curious to see the output of /var/adm/ras/mmsdrserv.log from the manager nodes to see if sdrserv is indeed starting but getting harpooned by systemd. -Aaron On 7/28/16 4:16 PM, Marc A Kaplan wrote: > Allow me to restate and demonstrate: > > Even if systemd or any explicit kill signals destroy any/all running > mmcr* and mmsdr* processes, > > simply running mmlsconfig will fire up new mmcr* and mmsdr* processes. > For example: > > ## I used kill -9 to kill all mmccr, mmsdr, lxtrace, ... processes > > [root at n2 gpfs-git]# ps auwx | grep mm > root 9891 0.0 0.0 112640 980 pts/1 S+ 12:57 0:00 grep > --color=auto mm > > [root at n2 gpfs-git]# mmlsconfig > Configuration data for cluster madagascar.frozen: > ------------------------------------------------- > clusterName madagascar.frozen > ... > worker1Threads 1022 > adminMode central > > File systems in cluster madagascar.frozen: > ------------------------------------------ > /dev/mak > /dev/x1 > /dev/yy > /dev/zz > > ## mmlsconfig "needs" ccr and sdrserv, so if it doesn't see them, it > restarts them! > > [root at n2 gpfs-git]# ps auwx | grep mm > root 9929 0.0 0.0 114376 1696 pts/1 S 12:58 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 10110 0.0 0.0 20536 128 ? Ss 12:58 0:00 > /usr/lpp/mmfs/bin/lxtrace-3.10.0-123.el7.x86_64 on /tmp/mmfs/lxtrac > root 10125 0.0 0.0 493264 11064 ? Ssl 12:58 0:00 > /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 > root 10358 0.0 0.0 1700488 17636 ? Sl 12:58 0:00 python > /usr/lpp/mmfs/bin/mmsysmon.py > root 10440 0.0 0.0 114376 804 pts/1 S 12:59 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 10442 0.0 0.0 112640 976 pts/1 S+ 12:59 0:00 grep > --color=auto mm > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 29 16:56:14 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 29 Jul 2016 15:56:14 +0000 Subject: [gpfsug-discuss] mmchqos and already running maintenance commands Message-ID: <732CF329-476A-4C96-A3CA-6E35620A548E@vanderbilt.edu> Hi All, Looking for a little clarification here ? in the man page for mmchqos I see: * When you change allocations or mount the file system, a brief delay due to reconfiguration occurs before QoS starts applying allocations. If I?m already running a maintenance command and then I run an mmchqos does that mean that the already running maitenance command will adjust to the new settings or does this only apply to subsequently executed maintenance commands? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jul 29 18:18:22 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 29 Jul 2016 17:18:22 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2.1 Released Message-ID: <5E104D88-1A80-4FF2-B721-D0BF4B930CCE@nuance.com> Version 4.2.1 is out on Fix Central and has a bunch of new features and improvements, many of which have been discussed at recent user group meetings. What's new: http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.1/com.ibm.spectrum.scale.v4r21.doc/bl1xx_soc.htm Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Jul 29 18:57:31 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 29 Jul 2016 13:57:31 -0400 Subject: [gpfsug-discuss] mmchqos and already running maintenance commands In-Reply-To: <732CF329-476A-4C96-A3CA-6E35620A548E@vanderbilt.edu> References: <732CF329-476A-4C96-A3CA-6E35620A548E@vanderbilt.edu> Message-ID: mmchqos fs --enable ... maintenance=1234iops ... Will apply the new settings to all currently running and future maintenance commands. There is just a brief delay (I think it is well under 30 seconds) for the new settings to be propagated and become effective on each node. You can use `mmlsqos fs --seconds 70` to observe performance. Better, install gnuplot and run samples/charts/qosplot.pl or hack the script to push the data into your favorite plotter. --marc From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/29/2016 11:57 AM Subject: [gpfsug-discuss] mmchqos and already running maintenance commands Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Looking for a little clarification here ? in the man page for mmchqos I see: * When you change allocations or mount the file system, a brief delay due to reconfiguration occurs before QoS starts applying allocations. If I?m already running a maintenance command and then I run an mmchqos does that mean that the already running maitenance command will adjust to the new settings or does this only apply to subsequently executed maintenance commands? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Fri Jul 1 11:32:13 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Fri, 1 Jul 2016 10:32:13 +0000 Subject: [gpfsug-discuss] Trapped Inodes Message-ID: Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From makaplan at us.ibm.com Fri Jul 1 17:29:31 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 1 Jul 2016 12:29:31 -0400 Subject: [gpfsug-discuss] Trapped Inodes In-Reply-To: References: Message-ID: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Sat Jul 2 11:05:34 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Sat, 2 Jul 2016 10:05:34 +0000 Subject: [gpfsug-discuss] Trapped Inodes Message-ID: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Sat Jul 2 20:16:55 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sat, 2 Jul 2016 15:16:55 -0400 Subject: [gpfsug-discuss] Trapped Inodes In-Reply-To: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> References: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Message-ID: I have been informed that it is possible that a glitch can occur (for example an abrupt shutdown) which can leave you in a situation where it looks like all snapshots are deleted, but there is still a hidden snapshot that must be cleaned up... The workaround is to create a snapshot `mmcrsnapshot fs dummy` and then delete it `mmdelsnapshot fs dummy` and see if that clears up the situation... --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/02/2016 06:05 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Sun Jul 3 11:32:24 2016 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Sun, 3 Jul 2016 12:32:24 +0200 Subject: [gpfsug-discuss] Improving Testing Efficiency with IBM Spectrum Scale for Automated Driving Message-ID: In the press today: "Tesla Autopilot partner Mobileye comments on fatal crash, says tech isn?t meant to avoid this type of accident." http://electrek.co/2016/07/01/tesla-autopilot-mobileye-fatal-crash-comment/ "Tesla?s autopilot system was designed in-house and uses a fusion of dozens of internally- and externally-developed component technologies to determine the proper course of action in a given scenario. Since January 2016, Autopilot activates automatic emergency braking in response to any interruption of the ground plane in the path of the vehicle that cross-checks against a consistent radar signature. In the case of this accident, the high, white side of the box truck, combined with a radar signature that would have looked very similar to an overhead sign, caused automatic braking not to fire.? More testing is needed ! Finding a way to improve ADAS/AD testing throughput by factor. more HiL tests would have better helped to avoid this accident I guess, as white side box trucks are very common on the roads arent't they? So another strong reason to use GPFS/SpectrumScale/ESS filesystems to provide video files to paralell HiL stations for testing and verification using IBM AREMA for Automotive as essence system in order to find the relevant test cases. Facts: Currently most of the testing is done by copying large video files from some kind of "slow" NAS filer to the HiL stations and running the HiL test case from the internal HiL disks. A typical HiL test run takes 7-9min while the copy alone takes an additional 3-5 min upfront depending on the setup. Together with IBM partner SVA we tested to stream these video files from a ESS GL6 directly to the HiL stations without to copy them first. This worked well and the latency was fine and stable. As a result we could improve the number of HiL test cases per month by a good factor without adding more HiL hardware. See my presentation from the GPFS User Day at SPXXL 2016 in Garching February 17th 2016 9:00 - 9:30 Improving Testing Efficiency with IBM Spectrum Scale for Automated Driving https://www.spxxl.org/sites/default/files/GPFS-AREMA-TSM_et_al_for_ADAS_AD_Testing-Feb2016.pdf More: http://electrek.co/2015/10/14/tesla-reveals-all-the-details-of-its-autopilot-and-its-software-v7-0-slide-presentation-and-audio-conference/ -frank- P.S. HiL = Hardware in the Loop https://en.wikipedia.org/wiki/Hardware-in-the-loop_simulation Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Am Weiher 24, 65451 Kelsterbach mailto:kraemerf at de.ibm.com voice: +49-(0)171-3043699 / +4970342741078 IBM Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Sun Jul 3 15:55:26 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Sun, 3 Jul 2016 14:55:26 +0000 Subject: [gpfsug-discuss] Trapped Inodes In-Reply-To: References: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Message-ID: Hi Marc, Thanks for that suggestions. This seems to have removed the NULL fileset from the list, however mmdf now shows even more strange statistics: Inode Information ----------------- Total number of used inodes in all Inode spaces: -103900000 Total number of free inodes in all Inode spaces: -24797856 Total number of allocated inodes in all Inode spaces: -128697856 Total of Maximum number of inodes in all Inode spaces: -103900000 Any ideas why these negative numbers are being reported? Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: 02 July 2016 20:17 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Trapped Inodes I have been informed that it is possible that a glitch can occur (for example an abrupt shutdown) which can leave you in a situation where it looks like all snapshots are deleted, but there is still a hidden snapshot that must be cleaned up... The workaround is to create a snapshot `mmcrsnapshot fs dummy` and then delete it `mmdelsnapshot fs dummy` and see if that clears up the situation... --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/02/2016 06:05 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan > wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Sun Jul 3 19:42:32 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 3 Jul 2016 14:42:32 -0400 Subject: [gpfsug-discuss] Trapped Inodes - releases, now count them properly! In-Reply-To: References: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Message-ID: mmdf statistics are not real-time accurate, there is a trade off in accuracy vs the cost of polling each node that might have the file system mounted. That said, here are some possibilities, in increasing order of impact on users and your possible desperation ;-) A1. Wait a while (at most a few minutes) and see if the mmdf stats are updated. A2. mmchmgr fs another-node may force new stats to be sent to the new fs manager. (Not sure but I expect it will.) B. Briefly quiesce the file system with: mmfsctl fs suspend; mmfsctl fs resume; C. If you have no users active ... I'm pretty sure mmumount fs -a ; mmmount fs -a; will clear the problem ... but there's always D. mmshutdown -a ; mmstartup -a E. If none of those resolve the situation something is hosed -- F. hope that mmfsck can fix it. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/03/2016 10:55 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Marc, Thanks for that suggestions. This seems to have removed the NULL fileset from the list, however mmdf now shows even more strange statistics: Inode Information ----------------- Total number of used inodes in all Inode spaces: -103900000 Total number of free inodes in all Inode spaces: -24797856 Total number of allocated inodes in all Inode spaces: -128697856 Total of Maximum number of inodes in all Inode spaces: -103900000 Any ideas why these negative numbers are being reported? Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: 02 July 2016 20:17 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Trapped Inodes I have been informed that it is possible that a glitch can occur (for example an abrupt shutdown) which can leave you in a situation where it looks like all snapshots are deleted, but there is still a hidden snapshot that must be cleaned up... The workaround is to create a snapshot `mmcrsnapshot fs dummy` and then delete it `mmdelsnapshot fs dummy` and see if that clears up the situation... --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/02/2016 06:05 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach To: gpfsug main discussion list Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Mon Jul 4 10:44:02 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Mon, 4 Jul 2016 09:44:02 +0000 Subject: [gpfsug-discuss] Trapped Inodes - releases, now count them properly! In-Reply-To: References: <9486ee22-34cf-4e14-bc9b-3e824609e9ec@email.android.com> Message-ID: Hi Marc, Thanks again for the suggestions. An interesting report in the log while the another node took over managing the filesystem: Mon Jul 4 10:24:08.616 2016: [W] Inode space 10 in file system gpfs is approaching the limit for the maximum number of inodes. Inode space 10 was the independent fileset that the snapshot creation/deletion managed to remove. Still getting negative inode numbers reported after migrating manager functions and suspending/resuming the file system: Inode Information ----------------- Total number of used inodes in all Inode spaces: -103900000 Total number of free inodes in all Inode spaces: -24797856 Total number of allocated inodes in all Inode spaces: -128697856 Total of Maximum number of inodes in all Inode spaces: -103900000 I?ll have to wait until later today to try unmounting, daemon recycle or mmfsck. Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: 03 July 2016 19:43 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Trapped Inodes - releases, now count them properly! mmdf statistics are not real-time accurate, there is a trade off in accuracy vs the cost of polling each node that might have the file system mounted. That said, here are some possibilities, in increasing order of impact on users and your possible desperation ;-) A1. Wait a while (at most a few minutes) and see if the mmdf stats are updated. A2. mmchmgr fs another-node may force new stats to be sent to the new fs manager. (Not sure but I expect it will.) B. Briefly quiesce the file system with: mmfsctl fs suspend; mmfsctl fs resume; C. If you have no users active ... I'm pretty sure mmumount fs -a ; mmmount fs -a; will clear the problem ... but there's always D. mmshutdown -a ; mmstartup -a E. If none of those resolve the situation something is hosed -- F. hope that mmfsck can fix it. --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/03/2016 10:55 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Marc, Thanks for that suggestions. This seems to have removed the NULL fileset from the list, however mmdf now shows even more strange statistics: Inode Information ----------------- Total number of used inodes in all Inode spaces: -103900000 Total number of free inodes in all Inode spaces: -24797856 Total number of allocated inodes in all Inode spaces: -128697856 Total of Maximum number of inodes in all Inode spaces: -103900000 Any ideas why these negative numbers are being reported? Cheers, Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: 02 July 2016 20:17 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Trapped Inodes I have been informed that it is possible that a glitch can occur (for example an abrupt shutdown) which can leave you in a situation where it looks like all snapshots are deleted, but there is still a hidden snapshot that must be cleaned up... The workaround is to create a snapshot `mmcrsnapshot fs dummy` and then delete it `mmdelsnapshot fs dummy` and see if that clears up the situation... --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/02/2016 06:05 AM Subject: Re: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi Marc, Thanks for the suggestion. Snapshots were my first suspect but there are none anywhere on the filesystem. Cheers, Luke. On 1 Jul 2016 5:30 pm, Marc A Kaplan > wrote: Question and Suggestion: Do you have any snapshots that might include files that were in the fileset you are attempting to delete? Deleting those snapshots will allow the fileset deletion to complete. The snapshots are kinda intertwined with what was the "live" copy of the inodes. In the GPFS "ditto" implementation of snapshotting, for a file that has not changed since the snapshot operation, the snapshot copy is not really a copy but just a pointer to the "live" file. So even after you have logically deleted the "live" files, the snapshot still points to those inodes you thought you deleted. Rather than invalidate the snapshot, (you wouldn't want that, would you?!) GPFS holds onto the inodes, until they are no longer referenced by any snapshot. --marc From: Luke Raimbach > To: gpfsug main discussion list > Date: 07/01/2016 06:32 AM Subject: [gpfsug-discuss] Trapped Inodes Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, I've run out of inodes on a relatively small filesystem. The total metadata capacity allows for a maximum of 188,743,680 inodes. A fileset containing 158,000,000 inodes was force deleted and has gone into a bad state, where it is reported as (NULL) and has state "deleted": Attributes for fileset (NULL): =============================== Status Deleted Path -- Id 15 Root inode latest: Parent Id Created Wed Jun 15 14:07:51 2016 Comment Inode space 8 Maximum number of inodes 158000000 Allocated inodes 158000000 Permission change flag chmodAndSetacl afm-associated No Offline mmfsck fixed a few problems, but didn't free these poor, trapped inodes. Now I've run out and mmdf is telling me crazy things like this: Inode Information ----------------- Total number of used inodes in all Inode spaces: 0 Total number of free inodes in all Inode spaces: 27895680 Total number of allocated inodes in all Inode spaces: 27895680 Total of Maximum number of inodes in all Inode spaces: 34100000 Current GPFS build: "4.2.0.3". Who will help me rescue these inodes? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Tue Jul 5 15:25:06 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Tue, 5 Jul 2016 14:25:06 +0000 Subject: [gpfsug-discuss] Samba Export Anomalies Message-ID: Hi All, I'm having a frustrating time exporting an Independent Writer AFM fileset through Samba. Native GPFS directories exported through Samba seem to work properly, but when creating an export which points to an AFM IW fileset, I get "Access Denied" errors when trying to create files from an SMB client and even more unusual "Failed to enumerate objects in the container: Access is denied." messages if I try to modify the Access Control Entries through a Windows client. Here is the smb.conf file: ***[BEGIN smb.conf]*** [global] idmap config * : backend = autorid idmap config * : range = 100000-999999 idmap config THECRICK : backend = ad idmap config THECRICK : schema_mode = rfc2307 idmap config THECRICK : range = 30000000-31999999 local master = no realm = THECRICK.ORG security = ADS aio read size = 1 aio write size = 1 async smb echo handler = yes clustering = yes ctdbd socket = /var/run/ctdb/ctdbd.socket ea support = yes force unknown acl user = yes level2 oplocks = no log file = /var/log/samba/log.%m log level = 3 map hidden = yes map readonly = no netbios name = MS_GENERAL printcap name = /etc/printcap printing = lprng server string = Samba Server Version %v socket options = TCP_NODELAY SO_KEEPALIVE TCP_KEEPCNT=4 TCP_KEEPIDLE=240 TCP_KEEPINTVL=15 store dos attributes = yes strict allocate = yes strict locking = no unix extensions = no vfs objects = shadow_copy2 syncops fileid streams_xattr gpfs gpfs:dfreequota = yes gpfs:hsm = yes gpfs:leases = yes gpfs:prealloc = yes gpfs:sharemodes = yes gpfs:winattr = yes nfs4:acedup = merge nfs4:chown = yes nfs4:mode = simple notify:inotify = yes shadow:fixinodes = yes shadow:format = @GMT-%Y.%m.%d-%H.%M.%S shadow:snapdir = .snapshots shadow:snapdirseverywhere = yes shadow:sort = desc smbd:backgroundqueue = false smbd:search ask sharemode = false syncops:onmeta = no workgroup = THECRICK winbind enum groups = yes winbind enum users = yes [production_rw] comment = Production writable path = /general/production read only = no [stp-test] comment = STP Test Export path = /general/export/stp/stp-test read-only = no ***[END smb.conf]*** The [production_rw] export is a test directory on the /general filesystem which works from an SMB client. The [stp-test] export is an AFM fileset on the /general filesystem which is a cache of a directory in another GPFS filesystem: ***[BEGIN mmlsfileset general crick.general.export.stp.stp-test --afm -L]*** Attributes for fileset crick.general.export.stp.stp-test: ========================================================== Status Linked Path /general/export/stp/stp-test Id 1 Root inode 1048579 Parent Id 0 Created Fri Jul 1 15:56:48 2016 Comment Inode space 1 Maximum number of inodes 200000 Allocated inodes 100000 Permission change flag chmodAndSetacl afm-associated Yes Target gpfs:///camp/stp/stp-test Mode independent-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Gateway Flush Threads 4 Prefetch Threshold 0 (default) Eviction Enabled yes (default) ***[END mmlsfileset general crick.general.export.stp.stp-test --afm -L]*** Anyone spot any glaringly obvious misconfigurations? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From bbanister at jumptrading.com Tue Jul 5 15:58:35 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 5 Jul 2016 14:58:35 +0000 Subject: [gpfsug-discuss] nosmap parameter for RHEL7 x86_64 on Haswell/Broadwell? In-Reply-To: <565240ad49e6476da9c1d3d11312f88c@mbxpsc1.winmail.deshaw.com> References: <565240ad49e6476da9c1d3d11312f88c@mbxpsc1.winmail.deshaw.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB061B491A@CHI-EXCHANGEW1.w2k.jumptrading.com> Wanted to comment that we also hit this issue and agree with Paul that it would be nice in the FAQ to at least have something like the vertical bars that denote changed or added lines in a document, which are seen in the GPFS Admin guides. This should make it easy to see what has changed. Would also be nice to "Follow this page" to get notifications of when the FAQ changes from my IBM Knowledge Center account... or maybe the person that publishes the changes could announce the update on the GPFS - Announce Developer Works page. https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000001606 Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sanchez, Paul Sent: Friday, June 03, 2016 2:38 PM To: gpfsug main discussion list (gpfsug-discuss at spectrumscale.org) Subject: [gpfsug-discuss] nosmap parameter for RHEL7 x86_64 on Haswell/Broadwell? After some puzzling debugging on our new Broadwell servers, all of which slowly became brick-like upon after getting stuck starting GPFS, we discovered that this was already a known issue in the FAQ. Adding "nosmap" to the kernel command line in grub prevents SMAP from seeing the kernel-userspace memory interactions of GPFS as a reason to slowly grind all cores to a standstill, apparently spinning on stuck locks(?). (Big thanks go to RedHat for turning us on to the answer when we opened a case.) >From https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html, section 3.2: Note: In order for IBM Spectrum Scale on RHEL 7 to run on the Haswell processor * Disable the Supervisor Mode Access Prevention (smap) kernel parameter * Reboot the RHEL 7 node before using GPFS Some observations worth noting: 1. We've been running for a year with Haswell processors and have hundreds of Haswell RHEL7 nodes which do not exhibit this problem. So maybe this only really affects Broadwell CPUs? 2. It would be very nice for SpectrumScale to take a peek at /proc/cpuinfo and /proc/cmdline before starting up, and refuse to break the host when it has affected processors and kernel without "nosmap". Instead, an error message describing the fix would have made my day. 3. I'm going to have to start using a script to diff the FAQ for these gotchas, unless anyone knows of a better way to subscribe just to updates to this doc. Thanks, Paul Sanchez ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From duersch at us.ibm.com Tue Jul 5 19:31:28 2016 From: duersch at us.ibm.com (Steve Duersch) Date: Tue, 5 Jul 2016 14:31:28 -0400 Subject: [gpfsug-discuss] nosmap parameter for RHEL7 x86_64 on Haswell/Broadwell? FAQ Updates Message-ID: The PDF version of the FAQ ( http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/gpfsclustersfaq.pdf ) does have change bars. Also at the top it lists the questions that have been changed. Your suggestion for "announcing" new faq version does make sense and I'll email the one responsible for posting the faq. Thank you. Steve Duersch Spectrum Scale (GPFS) FVTest 845-433-7902 IBM Poughkeepsie, New York Message: 2 Date: Tue, 5 Jul 2016 14:58:35 +0000 From: Bryan Banister To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] nosmap parameter for RHEL7 x86_64 on Haswell/Broadwell? Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB061B491A at CHI-EXCHANGEW1.w2k.jumptrading.com> Content-Type: text/plain; charset="us-ascii" Wanted to comment that we also hit this issue and agree with Paul that it would be nice in the FAQ to at least have something like the vertical bars that denote changed or added lines in a document, which are seen in the GPFS Admin guides. This should make it easy to see what has changed. Would also be nice to "Follow this page" to get notifications of when the FAQ changes from my IBM Knowledge Center account... or maybe the person that publishes the changes could announce the update on the GPFS - Announce Developer Works page. https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000001606 Cheers, -Bryan -------------- next part -------------- An HTML attachment was scrubbed... URL: From konstantin.arnold at unibas.ch Tue Jul 5 19:53:03 2016 From: konstantin.arnold at unibas.ch (Konstantin Arnold) Date: Tue, 5 Jul 2016 20:53:03 +0200 Subject: [gpfsug-discuss] Samba Export Anomalies In-Reply-To: References: Message-ID: <577C020F.2080507@unibas.ch> Hi Luke, probably I don't have enough information about your AFM setup but maybe you could check the ACLs on the export as well as ACLs on the directory to be mounted. If you are using AFM from a home location that has ACLs set then they will also be transferred to cache location. (We ran into similar issues when we had to take over data from a SONAS system that was assigning gid_numbers from an internal mapping table - all had to be cleaned up first before clients could have access through our CES system.) Best Konstantin On 07/05/2016 04:25 PM, Luke Raimbach wrote: > Hi All, > > I'm having a frustrating time exporting an Independent Writer AFM fileset through Samba. > > Native GPFS directories exported through Samba seem to work properly, but when creating an export which points to an AFM IW fileset, I get "Access Denied" errors when trying to create files from an SMB client and even more unusual "Failed to enumerate objects in the container: Access is denied." messages if I try to modify the Access Control Entries through a Windows client. > > Here is the smb.conf file: > > ***[BEGIN smb.conf]*** > > [global] > idmap config * : backend = autorid > idmap config * : range = 100000-999999 > idmap config THECRICK : backend = ad > idmap config THECRICK : schema_mode = rfc2307 > idmap config THECRICK : range = 30000000-31999999 > local master = no > realm = THECRICK.ORG > security = ADS > aio read size = 1 > aio write size = 1 > async smb echo handler = yes > clustering = yes > ctdbd socket = /var/run/ctdb/ctdbd.socket > ea support = yes > force unknown acl user = yes > level2 oplocks = no > log file = /var/log/samba/log.%m > log level = 3 > map hidden = yes > map readonly = no > netbios name = MS_GENERAL > printcap name = /etc/printcap > printing = lprng > server string = Samba Server Version %v > socket options = TCP_NODELAY SO_KEEPALIVE TCP_KEEPCNT=4 TCP_KEEPIDLE=240 TCP_KEEPINTVL=15 > store dos attributes = yes > strict allocate = yes > strict locking = no > unix extensions = no > vfs objects = shadow_copy2 syncops fileid streams_xattr gpfs > gpfs:dfreequota = yes > gpfs:hsm = yes > gpfs:leases = yes > gpfs:prealloc = yes > gpfs:sharemodes = yes > gpfs:winattr = yes > nfs4:acedup = merge > nfs4:chown = yes > nfs4:mode = simple > notify:inotify = yes > shadow:fixinodes = yes > shadow:format = @GMT-%Y.%m.%d-%H.%M.%S > shadow:snapdir = .snapshots > shadow:snapdirseverywhere = yes > shadow:sort = desc > smbd:backgroundqueue = false > smbd:search ask sharemode = false > syncops:onmeta = no > workgroup = THECRICK > winbind enum groups = yes > winbind enum users = yes > > [production_rw] > comment = Production writable > path = /general/production > read only = no > > [stp-test] > comment = STP Test Export > path = /general/export/stp/stp-test > read-only = no > > ***[END smb.conf]*** > > > The [production_rw] export is a test directory on the /general filesystem which works from an SMB client. The [stp-test] export is an AFM fileset on the /general filesystem which is a cache of a directory in another GPFS filesystem: > > > ***[BEGIN mmlsfileset general crick.general.export.stp.stp-test --afm -L]*** > > Attributes for fileset crick.general.export.stp.stp-test: > ========================================================== > Status Linked > Path /general/export/stp/stp-test > Id 1 > Root inode 1048579 > Parent Id 0 > Created Fri Jul 1 15:56:48 2016 > Comment > Inode space 1 > Maximum number of inodes 200000 > Allocated inodes 100000 > Permission change flag chmodAndSetacl > afm-associated Yes > Target gpfs:///camp/stp/stp-test > Mode independent-writer > File Lookup Refresh Interval 30 (default) > File Open Refresh Interval 30 (default) > Dir Lookup Refresh Interval 60 (default) > Dir Open Refresh Interval 60 (default) > Async Delay 15 (default) > Last pSnapId 0 > Display Home Snapshots no > Number of Gateway Flush Threads 4 > Prefetch Threshold 0 (default) > Eviction Enabled yes (default) > > ***[END mmlsfileset general crick.general.export.stp.stp-test --afm -L]*** > > > Anyone spot any glaringly obvious misconfigurations? > > Cheers, > Luke. > > Luke Raimbach? > Senior HPC Data and Storage Systems Engineer, > The Francis Crick Institute, > Gibbs Building, > 215 Euston Road, > London NW1 2BE. > > E: luke.raimbach at crick.ac.uk > W: www.crick.ac.uk > > The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From r.sobey at imperial.ac.uk Wed Jul 6 10:37:29 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 09:37:29 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> Message-ID: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Jul 6 10:47:16 2016 From: bevans at pixitmedia.com (Barry Evans) Date: Wed, 6 Jul 2016 10:47:16 +0100 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> Message-ID: <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: > > Quick followup on this. Doing some more samba debugging (i.e. > increasing log levels!) and come up with the following: > > [2016/07/06 10:07:35.602080, 3] > ../source3/smbd/vfs.c:1322(check_reduced_name) > > check_reduced_name: > admin/ict/serviceoperations/slough_project/Slough_Layout reduced to > /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout > > [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) > > unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) > returning 0644 > > [2016/07/06 10:07:35.613374, 0] > ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) > > * user does not have list permission on snapdir > /gpfs/prd/groupspace/ic/admin/ict/.snapshots* > > [2016/07/06 10:07:35.613416, 0] > ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) > > access denied on listing snapdir > /gpfs/prd/groupspace/ic/admin/ict/.snapshots > > [2016/07/06 10:07:35.613434, 0] > ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) > > FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, > failed - NT_STATUS_ACCESS_DENIED. > > [2016/07/06 10:07:47.648557, 3] > ../source3/smbd/service.c:1138(close_cnum) > > 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to > service IPC$ > > Any takers? I cannot run mmgetacl on the .snapshots folder at all, as > root. A snapshot I just created to make sure I had full control on the > folder: (39367 is me, I didn?t run this command on a CTDB node so the > UID mapping isn?t working). > > [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 > > #NFSv4 ACL > > #owner:root > > #group:root > > group:74036:r-x-:allow:FileInherit:DirInherit:Inherited > > (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL > (X)READ_ATTR (X)READ_NAMED > > (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL > (-)WRITE_ATTR (-)WRITE_NAMED > > user:39367:rwxc:allow:FileInherit:DirInherit:Inherited > > (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL > (X)READ_ATTR (X)READ_NAMED > > (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL > (X)WRITE_ATTR (X)WRITE_NAMED > > *From:*gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] *On Behalf Of > *Sobey, Richard A > *Sent:* 20 June 2016 16:03 > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but > our customers have come to like previous versions and indeed it is > sort of a selling point for us. > > Samba is the only thing we?ve changed recently after the badlock > debacle so I?m tempted to blame that, but who knows. > > If (when) I find out I?ll let everyone know. > > Richard > > *From:*gpfsug-discuss-bounces at spectrumscale.org > [mailto:gpfsug-discuss-bounces at spectrumscale.org] *On Behalf Of > *Buterbaugh, Kevin L > *Sent:* 20 June 2016 15:56 > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Hi Richard, > > I can?t answer your question but I can tell you that we have > experienced either the exact same thing you are or something very > similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 > and it persists even after upgraded to GPFS 4.2.0.3 and the very > latest sernet-samba. > > And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* > upgrade SAMBA versions at that time. Therefore, I believe that > something changed in GPFS. That doesn?t mean it?s GPFS? fault, of > course. SAMBA may have been relying on a > bugundocumented feature in GPFS that IBM fixed > for all I know, and I?m obviously speculating here. > > The problem we see is that the .snapshots directory in each folder can > be cd?d to but is empty. The snapshots are all there, however, if you: > > cd //.snapshots/ taken>/rest/of/path/to/folder/in/question > > This obviously prevents users from being able to do their own recovery > of files unless you do something like what you describe, which we are > unwilling to do for security reasons. We have a ticket open with DDN? > > Kevin > > On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > > wrote: > > Hi all > > Can someone clarify if the ability for Windows to view snapshots > as Previous Versions is exposed by SAMBA or GPFS? Basically, if > suddenly my users cannot restore files from snapshots over a CIFS > share, where should I be looking? > > I don?t know when this problem occurred, but within the last few > weeks certainly our users with full control over their data now > see no previous versions available, but if we export their fileset > and set ?force user = root? all the snapshots are available. > > I think the answer is SAMBA, right? We?re running GPFS 3.5 and > sernet-samba 4.2.9. > > Many thanks > > Richard > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss atspectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ? > > Kevin Buterbaugh - Senior System Administrator > > Vanderbilt University - Advanced Computing Center for Research and > Education > > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 6 10:55:14 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 09:55:14 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn't run this command on a CTDB node so the UID mapping isn't working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we've changed recently after the badlock debacle so I'm tempted to blame that, but who knows. If (when) I find out I'll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can't answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn't mean it's GPFS' fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I'm obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd'd to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN... Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don't know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set "force user = root" all the snapshots are available. I think the answer is SAMBA, right? We're running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss - Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com [http://pixitmedia.com/sig/sig-cio.jpg] This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Wed Jul 6 12:50:56 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Wed, 6 Jul 2016 11:50:56 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 6 13:22:53 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 12:22:53 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch, (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jul 6 15:45:57 2016 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 6 Jul 2016 07:45:57 -0700 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From r.sobey at imperial.ac.uk Wed Jul 6 15:54:25 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 14:54:25 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jul 6 16:21:06 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 6 Jul 2016 15:21:06 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: <6C7203C4-27D4-40E7-979F-B186D4622BC7@vanderbilt.edu> Hi Richard, Is that a typo in the version? We?re also using Sernet Samba but we?ve got 4.3.9? Kevin On Jul 6, 2016, at 9:54 AM, Sobey, Richard A > wrote: Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 6 16:23:16 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 15:23:16 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: <6C7203C4-27D4-40E7-979F-B186D4622BC7@vanderbilt.edu> References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> <6C7203C4-27D4-40E7-979F-B186D4622BC7@vanderbilt.edu> Message-ID: I?m afraid it?s not a typo ? [root at server gpfs]# rpm -qa | grep sernet sernet-samba-ctdb-tests-4.2.9-19.el6.x86_64 sernet-samba-common-4.2.9-19.el6.x86_64 sernet-samba-winbind-4.2.9-19.el6.x86_64 sernet-samba-ad-4.2.9-19.el6.x86_64 sernet-samba-libs-4.2.9-19.el6.x86_64 sernet-samba-4.2.9-19.el6.x86_64 sernet-samba-libsmbclient-devel-4.2.9-19.el6.x86_64 sernet-samba-libsmbclient0-4.2.9-19.el6.x86_64 sernet-samba-ctdb-4.2.9-19.el6.x86_64 sernet-samba-libwbclient-devel-4.2.9-19.el6.x86_64 sernet-samba-client-4.2.9-19.el6.x86_64 sernet-samba-debuginfo-4.2.9-19.el6.x86_64 From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 06 July 2016 16:21 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, Is that a typo in the version? We?re also using Sernet Samba but we?ve got 4.3.9? Kevin On Jul 6, 2016, at 9:54 AM, Sobey, Richard A > wrote: Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" > To: gpfsug main discussion list > Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 6 16:26:36 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 6 Jul 2016 15:26:36 +0000 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu> <57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> <6C7203C4-27D4-40E7-979F-B186D4622BC7@vanderbilt.edu> Message-ID: By the way, we are planning to go to CES / 4.2.x in a matter of weeks, but understanding this problem was quite important for me. Perhaps knowing now that the fix is probably to install a different version of Samba, we?ll probably leave it alone. Thank you everyone for your help, Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 06 July 2016 16:23 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions I?m afraid it?s not a typo ? [root at server gpfs]# rpm -qa | grep sernet sernet-samba-ctdb-tests-4.2.9-19.el6.x86_64 sernet-samba-common-4.2.9-19.el6.x86_64 sernet-samba-winbind-4.2.9-19.el6.x86_64 sernet-samba-ad-4.2.9-19.el6.x86_64 sernet-samba-libs-4.2.9-19.el6.x86_64 sernet-samba-4.2.9-19.el6.x86_64 sernet-samba-libsmbclient-devel-4.2.9-19.el6.x86_64 sernet-samba-libsmbclient0-4.2.9-19.el6.x86_64 sernet-samba-ctdb-4.2.9-19.el6.x86_64 sernet-samba-libwbclient-devel-4.2.9-19.el6.x86_64 sernet-samba-client-4.2.9-19.el6.x86_64 sernet-samba-debuginfo-4.2.9-19.el6.x86_64 From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 06 July 2016 16:21 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, Is that a typo in the version? We?re also using Sernet Samba but we?ve got 4.3.9? Kevin On Jul 6, 2016, at 9:54 AM, Sobey, Richard A > wrote: Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" > To: gpfsug main discussion list > Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hpc.ken.tw25qn at gmail.com Wed Jul 6 16:37:56 2016 From: hpc.ken.tw25qn at gmail.com (Ken Atkinson) Date: Wed, 6 Jul 2016 16:37:56 +0100 Subject: [gpfsug-discuss] =?utf-8?q?vn511i7_/_Windoqws_prevmqq1qqqqqq2qqqq?= =?utf-8?q?qqqqqqqqaqqa=C3=A0a=C3=A5io8iusk_versions?= Message-ID: 9G4HTGTB kk38?vv On 6 Jul 2016 15:46, "Christof Schmitt" wrote: > > The message in the trace confirms that this is triggered by: > https://git.samba.org/?p=samba.git;a=commitdiff;h=4 > > I 2asuspect that the Samba version used misses the patch > https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a > > The CES build of Samba shippied in Spectrum Scale includes the mentioned > patch, and that should avoid the problem seen. Would it be possible to > build Samba again with the mentioned patch to test whether that fixes the > issue seen here? > > Regards, > > Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ > christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) > > > > From: "Sobey, Richard A" > To: gpfsug main discussion list > Date: 07/06/2016 05:23 AM > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Thanks Daniel ? sorry to be dense, but does this indicate working as > intended, or a bug? I assume the former. So, the question still remains > how has this suddenly broken, when: > > [root at server ict]# mmgetacl -k nfs4 .snapshots/ > .snapshots/: Operation not permitted > > ?appears to be the correct output and is consistent with someone else?s > GPFS cluster where it is working. > > Cheers > > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org [ > mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel > Kidger > Sent: 06 July 2016 12:51 > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Looking at recent patches to SAMBA I see from December 2015: > https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch > , > (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which > includes the comment: > Failing that, smbd_check_access_rights should check Unix perms at that > point. > ) > > diff --git a/source3/modules/vfs_shadow_copy2.c > b/source3/modules/vfs_shadow_copy2.c > index fca05cf..07e2f8a 100644 > --- a/source3/modules/vfs_shadow_copy2.c > +++ b/source3/modules/vfs_shadow_copy2.c > @@ -30,6 +30,7 @@ > */ > > #include "includes.h" > +#include "smbd/smbd.h" > #include "system/filesys.h" > #include "include/ntioctl.h" > #include > @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct > *handle, > return NULL; > } > > +static bool check_access_snapdir(struct vfs_handle_struct *handle, > + const char *path) > +{ > + struct smb_filename smb_fname; > + int ret; > + NTSTATUS status; > + > + ZERO_STRUCT(smb_fname); > + smb_fname.base_name = talloc_asprintf(talloc_tos(), > + "%s", > + path); > + if (smb_fname.base_name == NULL) { > + return false; > + } > + > + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); > + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { > + TALLOC_FREE(smb_fname.base_name); > + return false; > + } > + > + status = smbd_check_access_rights(handle->conn, > + &smb_fname, > + false, > + SEC_DIR_LIST); > + if (!NT_STATUS_IS_OK(status)) { > + DEBUG(0,("user does not have list permission " > + "on snapdir %s\n", > + smb_fname.base_name)); > + TALLOC_FREE(smb_fname.base_name); > + return false; > + } > + TALLOC_FREE(smb_fname.base_name); > + return true; > +} > + > > Daniel > > > > > > Dr Daniel Kidger > IBM Technical Sales Specialist > Software Defined Solution Sales > > +44-07818 522 266 > daniel.kidger at uk.ibm.com > > > > > > > ----- Original message ----- > From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > Date: Wed, Jul 6, 2016 10:55 AM > > Sure. It might be easier if I just post the entire smb.conf: > > [global] > netbios name = store > workgroup = IC > security = ads > realm = IC.AC.UK > kerberos method = secrets and keytab > > vfs objects = shadow_copy2 syncops gpfs fileid > ea support = yes > store dos attributes = yes > map readonly = no > map archive = no > map system = no > map hidden = no > unix extensions = no > allocation roundup size = 1048576 > > disable netbios = yes > smb ports = 445 > # server signing = mandatory > > template shell = /bin/bash > interfaces = bond2 lo bond0 > allow trusted domains = no > > printing = bsd > printcap name = /dev/null > load printers = no > disable spoolss = yes > > idmap config IC : default = yes > idmap config IC : cache time = 180 > idmap config IC : backend = ad > idmap config IC : schema_mode = rfc2307 > idmap config IC : range = 500 - 2000000 > idmap config * : range = 3000000 - 3500000 > idmap config * : backend = tdb2 > winbind refresh tickets = yes > winbind nss info = rfc2307 > winbind use default domain = true > winbind offline logon = true > winbind separator = / > winbind enum users = true > winbind enum groups = true > winbind nested groups = yes > winbind expand groups = 2 > > winbind max clients = 10000 > > clustering = yes > ctdbd socket = /tmp/ctdb.socket > gpfs:sharemodes = yes > gpfs:winattr = yes > gpfs:leases = yes > gpfs:dfreequota = yes > # nfs4:mode = special > # nfs4:chown = no > nfs4:chown = yes > nfs4:mode = simple > > nfs4:acedup = merge > fileid:algorithm = fsname > force unknown acl user = yes > > shadow:snapdir = .snapshots > shadow:fixinodes = yes > shadow:snapdirseverywhere = yes > shadow:sort = desc > > syncops:onclose = no > syncops:onmeta = no > kernel oplocks = yes > level2 oplocks = yes > oplocks = yes > notify:inotify = no > wide links = no > async smb echo handler = yes > smbd:backgroundqueue = False > use sendfile = no > dmapi support = yes > > aio write size = 1 > aio read size = 1 > > enable core files = no > > #debug logging > log level = 2 > log file = /var/log/samba.%m > max log size = 1024 > debug timestamp = yes > > [IC] > comment = Unified Group Space Area > path = /gpfs/prd/groupspace/ic > public = no > read only = no > valid users = "@domain users" > > From: gpfsug-discuss-bounces at spectrumscale.org [ > mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans > Sent: 06 July 2016 10:47 > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Can you cut/paste your full VFS options for gpfs and shadow copy from > smb.conf? > > On 06/07/2016 10:37, Sobey, Richard A wrote: > Quick followup on this. Doing some more samba debugging (i.e. increasing > log levels!) and come up with the following: > > [2016/07/06 10:07:35.602080, 3] > ../source3/smbd/vfs.c:1322(check_reduced_name) > check_reduced_name: > admin/ict/serviceoperations/slough_project/Slough_Layout reduced to > /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout > [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) > unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) > returning 0644 > [2016/07/06 10:07:35.613374, 0] > ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) > user does not have list permission on snapdir > /gpfs/prd/groupspace/ic/admin/ict/.snapshots > [2016/07/06 10:07:35.613416, 0] > ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) > access denied on listing snapdir > /gpfs/prd/groupspace/ic/admin/ict/.snapshots > [2016/07/06 10:07:35.613434, 0] > ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) > FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed > - NT_STATUS_ACCESS_DENIED. > [2016/07/06 10:07:47.648557, 3] > ../source3/smbd/service.c:1138(close_cnum) > 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service > IPC$ > > Any takers? I cannot run mmgetacl on the .snapshots folder at all, as > root. A snapshot I just created to make sure I had full control on the > folder: (39367 is me, I didn?t run this command on a CTDB node so the UID > mapping isn?t working). > > [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 > #NFSv4 ACL > #owner:root > #group:root > group:74036:r-x-:allow:FileInherit:DirInherit:Inherited > (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL > (X)READ_ATTR (X)READ_NAMED > (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL > (-)WRITE_ATTR (-)WRITE_NAMED > > user:39367:rwxc:allow:FileInherit:DirInherit:Inherited > (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL > (X)READ_ATTR (X)READ_NAMED > (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL > (X)WRITE_ATTR (X)WRITE_NAMED > > From: gpfsug-discuss-bounces at spectrumscale.org [ > mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, > Richard A > Sent: 20 June 2016 16:03 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our > customers have come to like previous versions and indeed it is sort of a > selling point for us. > > Samba is the only thing we?ve changed recently after the badlock debacle > so I?m tempted to blame that, but who knows. > > If (when) I find out I?ll let everyone know. > > Richard > > From: gpfsug-discuss-bounces at spectrumscale.org [ > mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, > Kevin L > Sent: 20 June 2016 15:56 > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions > > Hi Richard, > > I can?t answer your question but I can tell you that we have experienced > either the exact same thing you are or something very similar. It > occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists > even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. > > And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* > upgrade SAMBA versions at that time. Therefore, I believe that something > changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA > may have been relying on a bugundocumented feature > in GPFS that IBM fixed for all I know, and I?m obviously speculating here. > > The problem we see is that the .snapshots directory in each folder can be > cd?d to but is empty. The snapshots are all there, however, if you: > > cd //.snapshots/ taken>/rest/of/path/to/folder/in/question > > This obviously prevents users from being able to do their own recovery of > files unless you do something like what you describe, which we are > unwilling to do for security reasons. We have a ticket open with DDN? > > Kevin > > On Jun 20, 2016, at 8:45 AM, Sobey, Richard A > wrote: > > Hi all > > Can someone clarify if the ability for Windows to view snapshots as > Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my > users cannot restore files from snapshots over a CIFS share, where should > I be looking? > > I don?t know when this problem occurred, but within the last few weeks > certainly our users with full control over their data now see no previous > versions available, but if we export their fileset and set ?force user = > root? all the snapshots are available. > > I think the answer is SAMBA, right? We?re running GPFS 3.5 and > sernet-samba 4.2.9. > > Many thanks > > Richard > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > Barry Evans > Technical Director & Co-Founder > Pixit Media > Mobile: +44 (0)7950 666 248 > http://www.pixitmedia.com > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jul 6 17:19:40 2016 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 6 Jul 2016 09:19:40 -0700 Subject: [gpfsug-discuss] Snapshots / Windows previous versions In-Reply-To: References: , <444210FB-0E5A-4103-88E6-5079CCD9E7D0@vanderbilt.edu><57b56123-9463-8f96-738c-3cd6d2c63af0@pixitmedia.com> Message-ID: The first patch is at least in Samba 4.2 and newer. The patch to the vfs_gpfs module is only in Samba 4.3 and newer. So any of these should fix your problem: - Add the vfs_gpfs patch to the source code of Samba 4.2.9 and recompile the code. - Upgrade to Sernet Samba 4.3.x or newer - Change the Samba services to the ones provided through CES Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 07:54 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Cheers Christof. We're using Sernet Samba [4.2.9] so limited by what they release. How can I identify which version of Samba is a) affect by the the first link and b) which version has got the patch incorporated? I'm not a developer as you can guess :) -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 06 July 2016 15:46 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions The message in the trace confirms that this is triggered by: https://git.samba.org/?p=samba.git;a=commitdiff;h=acbb4ddb6876c15543c5370e6d27faacebc8a231 I suspect that the Samba version used misses the patch https://git.samba.org/?p=samba.git;a=commitdiff;h=fdbca5e13a0375d7f18639679a627e67c3df647a The CES build of Samba shippied in Spectrum Scale includes the mentioned patch, and that should avoid the problem seen. Would it be possible to build Samba again with the mentioned patch to test whether that fixes the issue seen here? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) From: "Sobey, Richard A" To: gpfsug main discussion list Date: 07/06/2016 05:23 AM Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel ? sorry to be dense, but does this indicate working as intended, or a bug? I assume the former. So, the question still remains how has this suddenly broken, when: [root at server ict]# mmgetacl -k nfs4 .snapshots/ .snapshots/: Operation not permitted ?appears to be the correct output and is consistent with someone else?s GPFS cluster where it is working. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Daniel Kidger Sent: 06 July 2016 12:51 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Looking at recent patches to SAMBA I see from December 2015: https://download.samba.org/pub/samba/patches/security/samba-4.1.21-security-2015-12-16.patch , (link found at https://bugzilla.samba.org/show_bug.cgi?id=11658 which includes the comment: Failing that, smbd_check_access_rights should check Unix perms at that point. ) diff --git a/source3/modules/vfs_shadow_copy2.c b/source3/modules/vfs_shadow_copy2.c index fca05cf..07e2f8a 100644 --- a/source3/modules/vfs_shadow_copy2.c +++ b/source3/modules/vfs_shadow_copy2.c @@ -30,6 +30,7 @@ */ #include "includes.h" +#include "smbd/smbd.h" #include "system/filesys.h" #include "include/ntioctl.h" #include @@ -1138,6 +1139,42 @@ static char *have_snapdir(struct vfs_handle_struct *handle, return NULL; } +static bool check_access_snapdir(struct vfs_handle_struct *handle, + const char *path) +{ + struct smb_filename smb_fname; + int ret; + NTSTATUS status; + + ZERO_STRUCT(smb_fname); + smb_fname.base_name = talloc_asprintf(talloc_tos(), + "%s", + path); + if (smb_fname.base_name == NULL) { + return false; + } + + ret = SMB_VFS_NEXT_STAT(handle, &smb_fname); + if (ret != 0 || !S_ISDIR(smb_fname.st.st_ex_mode)) { + TALLOC_FREE(smb_fname.base_name); + return false; + } + + status = smbd_check_access_rights(handle->conn, + &smb_fname, + false, + SEC_DIR_LIST); + if (!NT_STATUS_IS_OK(status)) { + DEBUG(0,("user does not have list permission " + "on snapdir %s\n", + smb_fname.base_name)); + TALLOC_FREE(smb_fname.base_name); + return false; + } + TALLOC_FREE(smb_fname.base_name); + return true; +} + Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Sobey, Richard A" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Date: Wed, Jul 6, 2016 10:55 AM Sure. It might be easier if I just post the entire smb.conf: [global] netbios name = store workgroup = IC security = ads realm = IC.AC.UK kerberos method = secrets and keytab vfs objects = shadow_copy2 syncops gpfs fileid ea support = yes store dos attributes = yes map readonly = no map archive = no map system = no map hidden = no unix extensions = no allocation roundup size = 1048576 disable netbios = yes smb ports = 445 # server signing = mandatory template shell = /bin/bash interfaces = bond2 lo bond0 allow trusted domains = no printing = bsd printcap name = /dev/null load printers = no disable spoolss = yes idmap config IC : default = yes idmap config IC : cache time = 180 idmap config IC : backend = ad idmap config IC : schema_mode = rfc2307 idmap config IC : range = 500 - 2000000 idmap config * : range = 3000000 - 3500000 idmap config * : backend = tdb2 winbind refresh tickets = yes winbind nss info = rfc2307 winbind use default domain = true winbind offline logon = true winbind separator = / winbind enum users = true winbind enum groups = true winbind nested groups = yes winbind expand groups = 2 winbind max clients = 10000 clustering = yes ctdbd socket = /tmp/ctdb.socket gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = yes gpfs:dfreequota = yes # nfs4:mode = special # nfs4:chown = no nfs4:chown = yes nfs4:mode = simple nfs4:acedup = merge fileid:algorithm = fsname force unknown acl user = yes shadow:snapdir = .snapshots shadow:fixinodes = yes shadow:snapdirseverywhere = yes shadow:sort = desc syncops:onclose = no syncops:onmeta = no kernel oplocks = yes level2 oplocks = yes oplocks = yes notify:inotify = no wide links = no async smb echo handler = yes smbd:backgroundqueue = False use sendfile = no dmapi support = yes aio write size = 1 aio read size = 1 enable core files = no #debug logging log level = 2 log file = /var/log/samba.%m max log size = 1024 debug timestamp = yes [IC] comment = Unified Group Space Area path = /gpfs/prd/groupspace/ic public = no read only = no valid users = "@domain users" From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Barry Evans Sent: 06 July 2016 10:47 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Can you cut/paste your full VFS options for gpfs and shadow copy from smb.conf? On 06/07/2016 10:37, Sobey, Richard A wrote: Quick followup on this. Doing some more samba debugging (i.e. increasing log levels!) and come up with the following: [2016/07/06 10:07:35.602080, 3] ../source3/smbd/vfs.c:1322(check_reduced_name) check_reduced_name: admin/ict/serviceoperations/slough_project/Slough_Layout reduced to /gpfs/prd/groupspace/ic/admin/ict/serviceoperations/slough_project/Slough_Layout [2016/07/06 10:07:35.611881, 3] ../source3/smbd/dosmode.c:196(unix_mode) unix_mode(admin/ict/serviceoperations/slough_project/Slough_Layout) returning 0644 [2016/07/06 10:07:35.613374, 0] ../source3/modules/vfs_shadow_copy2.c:1211(check_access_snapdir) user does not have list permission on snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613416, 0] ../source3/modules/vfs_shadow_copy2.c:1380(shadow_copy2_get_shadow_copy_data) access denied on listing snapdir /gpfs/prd/groupspace/ic/admin/ict/.snapshots [2016/07/06 10:07:35.613434, 0] ../source3/modules/vfs_default.c:1145(vfswrap_fsctl) FSCTL_GET_SHADOW_COPY_DATA: connectpath /gpfs/prd/groupspace/ic, failed - NT_STATUS_ACCESS_DENIED. [2016/07/06 10:07:47.648557, 3] ../source3/smbd/service.c:1138(close_cnum) 155.198.55.14 (ipv4:155.198.55.14:51298) closed connection to service IPC$ Any takers? I cannot run mmgetacl on the .snapshots folder at all, as root. A snapshot I just created to make sure I had full control on the folder: (39367 is me, I didn?t run this command on a CTDB node so the UID mapping isn?t working). [root at icgpfs01 .snapshots]# mmgetacl -k nfs4 @GMT-2016.07.06-08.00.06 #NFSv4 ACL #owner:root #group:root group:74036:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:39367:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 20 June 2016 16:03 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Thanks Kevin. We are upgrading to GPFS 4.2 and CES in a few weeks but our customers have come to like previous versions and indeed it is sort of a selling point for us. Samba is the only thing we?ve changed recently after the badlock debacle so I?m tempted to blame that, but who knows. If (when) I find out I?ll let everyone know. Richard From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 20 June 2016 15:56 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Snapshots / Windows previous versions Hi Richard, I can?t answer your question but I can tell you that we have experienced either the exact same thing you are or something very similar. It occurred for us after upgrading from GPFS 3.5 to 4.1.0.8 and it persists even after upgraded to GPFS 4.2.0.3 and the very latest sernet-samba. And to be clear, when we upgraded from GPFS 3.5 to 4.1 we did *not* upgrade SAMBA versions at that time. Therefore, I believe that something changed in GPFS. That doesn?t mean it?s GPFS? fault, of course. SAMBA may have been relying on a bugundocumented feature in GPFS that IBM fixed for all I know, and I?m obviously speculating here. The problem we see is that the .snapshots directory in each folder can be cd?d to but is empty. The snapshots are all there, however, if you: cd //.snapshots//rest/of/path/to/folder/in/question This obviously prevents users from being able to do their own recovery of files unless you do something like what you describe, which we are unwilling to do for security reasons. We have a ticket open with DDN? Kevin On Jun 20, 2016, at 8:45 AM, Sobey, Richard A wrote: Hi all Can someone clarify if the ability for Windows to view snapshots as Previous Versions is exposed by SAMBA or GPFS? Basically, if suddenly my users cannot restore files from snapshots over a CIFS share, where should I be looking? I don?t know when this problem occurred, but within the last few weeks certainly our users with full control over their data now see no previous versions available, but if we export their fileset and set ?force user = root? all the snapshots are available. I think the answer is SAMBA, right? We?re running GPFS 3.5 and sernet-samba 4.2.9. Many thanks Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Mark.Bush at siriuscom.com Thu Jul 7 14:00:17 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 7 Jul 2016 13:00:17 +0000 Subject: [gpfsug-discuss] Migration policy confusion Message-ID: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Thu Jul 7 14:10:52 2016 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Thu, 7 Jul 2016 13:10:52 +0000 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Jul 7 14:12:12 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 7 Jul 2016 15:12:12 +0200 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 7 14:16:19 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 7 Jul 2016 13:16:19 +0000 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: <640419CE-E989-47CD-999D-65EC249C9B8A@siriuscom.com> Olaf, thanks. Yes the plan is to have SSD?s for the system pool ultimately but this is just a test system that I?m using to try and understand teiring better. The files (10 or so of them) are each 200MB in size. Mark From: on behalf of Olaf Weiser Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 8:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Migration policy confusion HI , first of all, given by the fact, that the MetaData is stored in system pool .. system should be the "fastest" pool / underlaying disks ... you have.. with a "slow" access to the MD, access to data is very likely affected.. (except for cached data, where MD is cached) in addition.. tell us, how "big" your test files are ? .. you moved by mmapplypolicy Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/07/2016 03:00 PM Subject: [gpfsug-discuss] Migration policy confusion Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 7 14:16:53 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 7 Jul 2016 13:16:53 +0000 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: <662B1BCF-4746-4825-A55F-A5B9361D60C4@siriuscom.com> Thanks Daniel. I did wait for 15-20 minutes after. From: on behalf of Daniel Kidger Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 8:10 AM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Migration policy confusion Mark, For performance reasons, mmdf gets its data updated asynchronously. Did you try waiting a few minutes? Daniel Error! Filename not specified. Error! Filename not specified. Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Migration policy confusion Date: Thu, Jul 7, 2016 2:00 PM Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From service at metamodul.com Thu Jul 7 14:18:41 2016 From: service at metamodul.com (- -) Date: Thu, 7 Jul 2016 15:18:41 +0200 (CEST) Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> Message-ID: <318576846.22999.a23b5e71-bef0-4fc7-9542-12ecb401ec9e.open-xchange@email.1und1.de> An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 7 15:20:12 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 7 Jul 2016 10:20:12 -0400 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: <662B1BCF-4746-4825-A55F-A5B9361D60C4@siriuscom.com> References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> <662B1BCF-4746-4825-A55F-A5B9361D60C4@siriuscom.com> Message-ID: At the very least, LOOK at the messages output by the mmapplypolicy command at the beginning and end. The "occupancy" stats for each pool are shown BEFORE and AFTER the command does its work. In even more detail, it shows you how many files and how many KB of data were (or will be or would be) migrated. Also, options matter. ReadTheFineManuals. -I test vs -I defer vs -I yes. To see exactly which files are being migrated, use -L 2 To see exactly which files are being selected by your rule(s), use -L 3 And for more details about the files being skipped over, etc, etc, -L 6 Gee, I just checked the doc myself, I forgot some of the details and it's pretty good. Admittedly mmapplypolicy is a complex command. You can do somethings simply, only knowing a few options and policy rules, BUT... As my father used to say, "When all else fails, read the directions!" -L n Controls the level of information displayed by the mmapplypolicy command. Larger values indicate the display of more detailed information. These terms are used: candidate file A file that matches a MIGRATE, DELETE, or LIST policy rule. chosen file A candidate file that has been scheduled for action. These are the valid values for n: 0 Displays only serious errors. 1 Displays some information as the command runs, but not for each file. This is the default. 2 Displays each chosen file and the scheduled migration or deletion action. 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. For examples and more information on this flag, see the section: The mmapplypolicy -L command in the IBM Spectrum Scale: Problem Determination Guide. --marc From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/07/2016 09:17 AM Subject: Re: [gpfsug-discuss] Migration policy confusion Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Daniel. I did wait for 15-20 minutes after. From: on behalf of Daniel Kidger Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 8:10 AM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Migration policy confusion Mark, For performance reasons, mmdf gets its data updated asynchronously. Did you try waiting a few minutes? Daniel Error! Filename not specified. Error! Filename not specified. Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Migration policy confusion Date: Thu, Jul 7, 2016 2:00 PM Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 7 15:30:33 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 7 Jul 2016 14:30:33 +0000 Subject: [gpfsug-discuss] Migration policy confusion In-Reply-To: References: <11D4769E-142A-414B-8DB5-94CBB8213A16@siriuscom.com> <662B1BCF-4746-4825-A55F-A5B9361D60C4@siriuscom.com> Message-ID: <877D722D-8CF5-496F-AAE5-7C0190E54D50@siriuscom.com> Thanks all. I realized that my file creation command was building 200k size files instead of the 200MB files. I fixed that and now I see the mmapplypolicy command take a bit more time and show accurate data as well as my bytes are now on the proper NSDs. It?s always some little thing that the human messes up isn?t it? ? From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 9:20 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Migration policy confusion At the very least, LOOK at the messages output by the mmapplypolicy command at the beginning and end. The "occupancy" stats for each pool are shown BEFORE and AFTER the command does its work. In even more detail, it shows you how many files and how many KB of data were (or will be or would be) migrated. Also, options matter. ReadTheFineManuals. -I test vs -I defer vs -I yes. To see exactly which files are being migrated, use -L 2 To see exactly which files are being selected by your rule(s), use -L 3 And for more details about the files being skipped over, etc, etc, -L 6 Gee, I just checked the doc myself, I forgot some of the details and it's pretty good. Admittedly mmapplypolicy is a complex command. You can do somethings simply, only knowing a few options and policy rules, BUT... As my father used to say, "When all else fails, read the directions!" -L n Controls the level of information displayed by the mmapplypolicy command. Larger values indicate the display of more detailed information. These terms are used: candidate file A file that matches a MIGRATE, DELETE, or LIST policy rule. chosen file A candidate file that has been scheduled for action. These are the valid values for n: 0 Displays only serious errors. 1 Displays some information as the command runs, but not for each file. This is the default. 2 Displays each chosen file and the scheduled migration or deletion action. 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. For examples and more information on this flag, see the section: The mmapplypolicy -L command in the IBM Spectrum Scale: Problem Determination Guide. --marc From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/07/2016 09:17 AM Subject: Re: [gpfsug-discuss] Migration policy confusion Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Thanks Daniel. I did wait for 15-20 minutes after. From: on behalf of Daniel Kidger Reply-To: gpfsug main discussion list Date: Thursday, July 7, 2016 at 8:10 AM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Migration policy confusion Mark, For performance reasons, mmdf gets its data updated asynchronously. Did you try waiting a few minutes? Daniel Error! Filename not specified. Error! Filename not specified. Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-07818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] Migration policy confusion Date: Thu, Jul 7, 2016 2:00 PM Hello all, I?m struggling trying to understand tiering and policies in general in SpecScale. I have a single filesystem with two pools defined (system, GOLD). The GOLD pool is made up of some faster disks than the system pool. The policy I?m trying to get working is as follows RULE 'go_gold' MIGRATE FROM POOL 'system' TO POOL 'GOLD' WHERE (LOWER(NAME) LIKE '%.perf') I?m simply trying to get the data to move the NDS?s in GOLD pool. When I do an mmapplypolicy, mmlsattr shows that it?s now in the GOLD pool but when I do a mmdf the data shows 100% free still. I tried a mmrestripefs as well and no change to the mmdf output. Am I missing something here? Is this just normal behavior and the blocks will get moved at some other time? I guess I was expecting instant gratification and that those files would have been moved to the correct NSD. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Thu Jul 7 20:44:15 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Thu, 7 Jul 2016 15:44:15 -0400 Subject: [gpfsug-discuss] Introductions Message-ID: All, My name is Brian Marshall; I am a computational scientist at Virginia Tech. We have ~2PB GPFS install we are about to expand this Summer and I may have some questions along the way. Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Jul 8 03:09:30 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Jul 2016 22:09:30 -0400 Subject: [gpfsug-discuss] mmpmon gfis fields question Message-ID: <223e7955-1831-6ff5-40e6-bd534f4c2e2f@nasa.gov> Does anyone know what the fields in the mmpmon gfis output indicate? # socat /var/mmfs/mmpmon/mmpmonSocket - _event_ newconnection _t_ 1467937547 _tu_ 372882 _n_ 10.101.11.1 _node_ local_node mmpmon gfis _response_ begin mmpmon gfis _mmpmon::gfis_ _n_ 10.101.11.1 _nn_ lorej001 _rc_ 0 _t_ 1467937550 _tu_ 518265 _cl_ disguise-gpfs _fs_ thome _d_ 5 _br_ 0 _bw_ 0 _c_ 0 _r_ 0 _w_ 0 _oc_ 0 _cc_ 0 _rdc_ 0 _wc_ 0 _dir_ 0 _iu_ 0 _irc_ Here's my best guess: _d_ number of disks in the filesystem _br_ bytes read from disk _bw_ bytes written to disk _c_ cache ops _r_ read ops _w_ write ops _oc_ open() calls _cc_ close() calls _rdc_ read() calls _wc_ write() calls _dir_ readdir calls _iu_ inode update count _irc_ inode read count _idc_ inode delete count _icc_ inode create count _bc_ bytes read from cache _sch_ stat cache hits _scm_ stat cache misses This is all because the mmpmon fs_io_s command doesn't give me a way that I can find to distinguish block/stat cache hits from cache misses which makes it harder to pinpoint misbehaving applications on the system. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Fri Jul 8 03:16:19 2016 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 7 Jul 2016 19:16:19 -0700 Subject: [gpfsug-discuss] mmpmon gfis fields question In-Reply-To: <223e7955-1831-6ff5-40e6-bd534f4c2e2f@nasa.gov> References: <223e7955-1831-6ff5-40e6-bd534f4c2e2f@nasa.gov> Message-ID: Hi, this is a undocumented mmpmon call, so you are on your own, but here is the correct description : _n_ IP address of the node responding. This is the address by which GPFS knows the node. _nn_ The name by which GPFS knows the node. _rc_ The reason/error code. In this case, the reply value is 0 (OK). _t_ Current time of day in seconds (absolute seconds since Epoch (1970)). _tu_ Microseconds part of the current time of day. _cl_ The name of the cluster that owns the file system. _fs_ The name of the file system for which data are being presented. _d_ The number of disks in the file system. _br_ Total number of bytes read from disk (not counting those read from cache.) _bw_ Total number of bytes written, to both disk and cache. _c_ The total number of read operations supplied from cache. _r_ The total number of read operations supplied from disk. _w_ The total number of write operations, to both disk and cache. _oc_ Count of open() call requests serviced by GPFS. _cc_ Number of close() call requests serviced by GPFS. _rdc_ Number of application read requests serviced by GPFS. _wc_ Number of application write requests serviced by GPFS. _dir_ Number of readdir() call requests serviced by GPFS. _iu_ Number of inode updates to disk. _irc_ Number of inode reads. _idc_ Number of inode deletions. _icc_ Number of inode creations. _bc_ Number of bytes read from the cache. _sch_ Number of stat cache hits. _scm_ Number of stat cache misses. On Thu, Jul 7, 2016 at 7:09 PM, Aaron Knister wrote: > Does anyone know what the fields in the mmpmon gfis output indicate? > > # socat /var/mmfs/mmpmon/mmpmonSocket - > _event_ newconnection _t_ 1467937547 _tu_ 372882 _n_ 10.101.11.1 _node_ > local_node > mmpmon gfis > _response_ begin mmpmon gfis > _mmpmon::gfis_ _n_ 10.101.11.1 _nn_ lorej001 _rc_ 0 _t_ 1467937550 _tu_ > 518265 _cl_ disguise-gpfs _fs_ thome _d_ 5 _br_ 0 _bw_ 0 _c_ 0 _r_ 0 _w_ 0 > _oc_ 0 _cc_ 0 _rdc_ 0 _wc_ 0 _dir_ 0 _iu_ 0 _irc_ > > > Here's my best guess: > > _d_ number of disks in the filesystem > _br_ bytes read from disk > _bw_ bytes written to disk > _c_ cache ops > _r_ read ops > _w_ write ops > _oc_ open() calls > _cc_ close() calls > _rdc_ read() calls > _wc_ write() calls > _dir_ readdir calls > _iu_ inode update count > _irc_ inode read count > _idc_ inode delete count > _icc_ inode create count > _bc_ bytes read from cache > _sch_ stat cache hits > _scm_ stat cache misses > > This is all because the mmpmon fs_io_s command doesn't give me a way that > I can find to distinguish block/stat cache hits from cache misses which > makes it harder to pinpoint misbehaving applications on the system. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Jul 8 04:13:59 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Jul 2016 23:13:59 -0400 Subject: [gpfsug-discuss] mmpmon gfis fields question In-Reply-To: References: <223e7955-1831-6ff5-40e6-bd534f4c2e2f@nasa.gov> Message-ID: Ah, thank you! That's a huge help. My preference, of course, would be to use documented calls but I'm already down that rabbit hole calling nsd_ds directly b/c the snmp agent chokes and dies a horrible death with 3.5k nodes and the number of NSDs we have. On 7/7/16 10:16 PM, Sven Oehme wrote: > Hi, > > this is a undocumented mmpmon call, so you are on your own, but here is > the correct description : > > > _n_ > > > > IP address of the node responding. This is the address by which GPFS > knows the node. > > _nn_ > > > > The name by which GPFS knows the node. > > _rc_ > > > > The reason/error code. In this case, the reply value is 0 (OK). > > _t_ > > > > Current time of day in seconds (absolute seconds since Epoch (1970)). > > _tu_ > > > > Microseconds part of the current time of day. > > _cl_ > > > > The name of the cluster that owns the file system. > > _fs_ > > > > The name of the file system for which data are being presented. > > _d_ > > > > The number of disks in the file system. > > _br_ > > > > Total number of bytes read from disk (not counting those read from cache.) > > _bw_ > > > > Total number of bytes written, to both disk and cache. > > _c_ > > > > The total number of read operations supplied from cache. > > _r_ > > > > The total number of read operations supplied from disk. > > _w_ > > > > The total number of write operations, to both disk and cache. > > _oc_ > > > > Count of open() call requests serviced by GPFS. > > _cc_ > > > > Number of close() call requests serviced by GPFS. > > _rdc_ > > > > Number of application read requests serviced by GPFS. > > _wc_ > > > > Number of application write requests serviced by GPFS. > > _dir_ > > > > Number of readdir() call requests serviced by GPFS. > > _iu_ > > > > Number of inode updates to disk. > > _irc_ > > > > Number of inode reads. > > _idc_ > > > > Number of inode deletions. > > _icc_ > > > > Number of inode creations. > > _bc_ > > > > Number of bytes read from the cache. > > _sch_ > > > > Number of stat cache hits. > > _scm_ > > > > Number of stat cache misses. > > > On Thu, Jul 7, 2016 at 7:09 PM, Aaron Knister > wrote: > > Does anyone know what the fields in the mmpmon gfis output indicate? > > # socat /var/mmfs/mmpmon/mmpmonSocket - > _event_ newconnection _t_ 1467937547 _tu_ 372882 _n_ 10.101.11.1 > _node_ local_node > mmpmon gfis > _response_ begin mmpmon gfis > _mmpmon::gfis_ _n_ 10.101.11.1 _nn_ lorej001 _rc_ 0 _t_ 1467937550 > _tu_ 518265 _cl_ disguise-gpfs _fs_ thome _d_ 5 _br_ 0 _bw_ 0 _c_ 0 > _r_ 0 _w_ 0 _oc_ 0 _cc_ 0 _rdc_ 0 _wc_ 0 _dir_ 0 _iu_ 0 _irc_ > > > Here's my best guess: > > _d_ number of disks in the filesystem > _br_ bytes read from disk > _bw_ bytes written to disk > _c_ cache ops > _r_ read ops > _w_ write ops > _oc_ open() calls > _cc_ close() calls > _rdc_ read() calls > _wc_ write() calls > _dir_ readdir calls > _iu_ inode update count > _irc_ inode read count > _idc_ inode delete count > _icc_ inode create count > _bc_ bytes read from cache > _sch_ stat cache hits > _scm_ stat cache misses > > This is all because the mmpmon fs_io_s command doesn't give me a way > that I can find to distinguish block/stat cache hits from cache > misses which makes it harder to pinpoint misbehaving applications on > the system. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From mweil at wustl.edu Mon Jul 11 17:33:02 2016 From: mweil at wustl.edu (Matt Weil) Date: Mon, 11 Jul 2016 11:33:02 -0500 Subject: [gpfsug-discuss] CES sizing guide In-Reply-To: <375ba33c-894f-215f-4044-e4995761f640@wustl.edu> References: <375ba33c-894f-215f-4044-e4995761f640@wustl.edu> Message-ID: Hello all, > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Sizing%20Guidance%20for%20Protocol%20Node > > Is there any more guidance on this as one socket can be a lot of cores and memory today. > > Thanks > ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From mimarsh2 at vt.edu Tue Jul 12 14:12:17 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Tue, 12 Jul 2016 09:12:17 -0400 Subject: [gpfsug-discuss] Aggregating filesystem performance Message-ID: All, I have a Spectrum Scale 4.1 cluster serving data to 4 different client clusters (~800 client nodes total). I am looking for ways to monitor filesystem performance to uncover network bottlenecks or job usage patterns affecting performance. I received this info below from an IBM person. Does anyone have examples of aggregating mmperfmon data? Is anyone doing something different? "mmpmon does not currently aggregate cluster-wide data. As of SS 4.1.x you can look at "mmperfmon query" as well, but it also primarily only provides node specific data. The tools are built to script performance data but there aren't any current scripts available for you to use within SS (except for what might be on the SS wiki page). It would likely be something you guys would need to build, that's what other clients have done." Thank you, Brian Marshall Virginia Tech - Advanced Research Computing -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Tue Jul 12 14:19:49 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 12 Jul 2016 13:19:49 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance Message-ID: <39E581EB-978D-4103-A2BC-FE4FF57B3608@nuance.com> Hi Brian I have a couple of pointers: - We have been running mmpmon for a while now across multiple clusters, sticking the data in external database for analysis. This has been working pretty well, but we are transitioning to (below) - SS 4.1 and later have built in zimon for collecting a wealth of performance data - this feeds into the built in GUI. But, there is bridge tools that IBM has built internally and keeps promising to release (I talked about it at the last SS user group meeting at Argonne) that allows use of Grafana with the zimon data. This is working well for us. Let me know if you want to discuss details and I will be happy to share my experiences and pointers in looking at the performance data. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Tuesday, July 12, 2016 at 9:12 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Aggregating filesystem performance All, I have a Spectrum Scale 4.1 cluster serving data to 4 different client clusters (~800 client nodes total). I am looking for ways to monitor filesystem performance to uncover network bottlenecks or job usage patterns affecting performance. I received this info below from an IBM person. Does anyone have examples of aggregating mmperfmon data? Is anyone doing something different? "mmpmon does not currently aggregate cluster-wide data. As of SS 4.1.x you can look at "mmperfmon query" as well, but it also primarily only provides node specific data. The tools are built to script performance data but there aren't any current scripts available for you to use within SS (except for what might be on the SS wiki page). It would likely be something you guys would need to build, that's what other clients have done." Thank you, Brian Marshall Virginia Tech - Advanced Research Computing -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Jul 12 14:23:12 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 12 Jul 2016 15:23:12 +0200 Subject: [gpfsug-discuss] Aggregating filesystem performance In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From ckrafft at de.ibm.com Wed Jul 13 09:49:00 2016 From: ckrafft at de.ibm.com (Christoph Krafft) Date: Wed, 13 Jul 2016 10:49:00 +0200 Subject: [gpfsug-discuss] GPFS / Spectrum Scale is now officially certified with SAP HANA on IBM Power Systrems Message-ID: Hi GPFS / Spectrum Scale "addicts", for all those using GPFS / Spectrum Scale "commercially" - IBM has certified it yesterday with SAP and it is NOW officially supported with HANA on IBM Power Systems. Please see the following SAP Note concerning the details. 2055470 - HANA on POWER Planning and Installation Specifics - Central Note: (See attached file: 2055470.pdf) Mit freundlichen Gr??en / Sincerely Christoph Krafft Client Technical Specialist - Power Systems, IBM Systems Certified IT Specialist @ The Open Group Phone: +49 (0) 7034 643 2171 IBM Deutschland GmbH Mobile: +49 (0) 160 97 81 86 12 Hechtsheimer Str. 2 Email: ckrafft at de.ibm.com 55131 Mainz Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 52106945.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 2055470.pdf Type: application/pdf Size: 101863 bytes Desc: not available URL: From mimarsh2 at vt.edu Wed Jul 13 14:43:43 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Wed, 13 Jul 2016 09:43:43 -0400 Subject: [gpfsug-discuss] Aggregating filesystem performance (Oesterlin, Robert) Message-ID: Robert, 1) Do you see any noticeable performance impact by running the performance monitoring? 2) Can you share the zimon configuration that you use? i.e. what metrics do you find most useful? Thank you, Brian Marshall -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 13 14:59:20 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 13 Jul 2016 13:59:20 +0000 Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Message-ID: Hi all Where can I find documentation on how to prepare RHEL 7.2 for an installation of SS 4.2 which will be a CES server? Is a "minimal" install (yes, install using the GUI, don't shoot me) sufficient or should I choose a different canned option. Thanks Richard PS I tried looking in the FAQ http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html but I couldn't find anything specific to this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 13 17:06:14 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 13 Jul 2016 16:06:14 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance Message-ID: Hi Brian I haven't seen any problems at all with the monitoring. (impacting performance). As for the Zimon metrics - let me assemble that and maybe discuss indetail off the mailing list (I've BCC'd you on this posting. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Wednesday, July 13, 2016 at 9:43 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] Aggregating filesystem performance (Oesterlin, Robert) Robert, 1) Do you see any noticeable performance impact by running the performance monitoring? 2) Can you share the zimon configuration that you use? i.e. what metrics do you find most useful? Thank you, Brian Marshall -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jul 13 17:08:32 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 13 Jul 2016 16:08:32 +0000 Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts In-Reply-To: References: Message-ID: The spectrumscale-protocols rpm (I think that was it) should include all the os dependencies you need for the various ss bits. If you were adding the ss rpms by hand, then there are packages you need to include. Unfortunately the protocols rpm adds all the protocols whether you want them or not from what I remember. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Sobey, Richard A [r.sobey at imperial.ac.uk] Sent: 13 July 2016 14:59 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Hi all Where can I find documentation on how to prepare RHEL 7.2 for an installation of SS 4.2 which will be a CES server? Is a ?minimal? install (yes, install using the GUI, don?t shoot me) sufficient or should I choose a different canned option. Thanks Richard PS I tried looking in the FAQ http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html but I couldn?t find anything specific to this. From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jul 13 17:18:18 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 13 Jul 2016 16:18:18 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance In-Reply-To: References: Message-ID: <1AC5C6E8-98A8-493B-ABB8-53237BAC23F5@Vanderbilt.Edu> Hi Bob, I am also in the process of setting up monitoring under GPFS (and it will always be GPFS) 4.2 on our test cluster right now and would also be interested in the experiences of others more experienced and knowledgeable than myself. Would you considering posting to the list? Or is there sensitive information that you don?t want to share on the list? Thanks? Kevin On Jul 13, 2016, at 11:06 AM, Oesterlin, Robert > wrote: Hi Brian I haven't seen any problems at all with the monitoring. (impacting performance). As for the Zimon metrics - let me assemble that and maybe discuss indetail off the mailing list (I've BCC'd you on this posting. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: > on behalf of Brian Marshall > Reply-To: gpfsug main discussion list > Date: Wednesday, July 13, 2016 at 9:43 AM To: "gpfsug-discuss at spectrumscale.org" > Subject: [EXTERNAL] Re: [gpfsug-discuss] Aggregating filesystem performance (Oesterlin, Robert) Robert, 1) Do you see any noticeable performance impact by running the performance monitoring? 2) Can you share the zimon configuration that you use? i.e. what metrics do you find most useful? Thank you, Brian Marshall _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jul 13 17:29:08 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 13 Jul 2016 16:29:08 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance In-Reply-To: <1AC5C6E8-98A8-493B-ABB8-53237BAC23F5@Vanderbilt.Edu> References: <1AC5C6E8-98A8-493B-ABB8-53237BAC23F5@Vanderbilt.Edu> Message-ID: <90988968-C133-4965-9A91-13AE1DB8C670@nuance.com> Sure, will do. Nothing sensitive here, just a fairly complex discussion for a mailing list! We'll see - give me a day or so. Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Wednesday, July 13, 2016 at 12:18 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Aggregating filesystem performance Hi Bob, I am also in the process of setting up monitoring under GPFS (and it will always be GPFS) 4.2 on our test cluster right now and would also be interested in the experiences of others more experienced and knowledgeable than myself. Would you considering posting to the list? Or is there sensitive information that you don?t want to share on the list? Thanks? Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Jul 13 18:09:24 2016 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 13 Jul 2016 17:09:24 +0000 Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts In-Reply-To: References: Message-ID: The gpfs.protocols package drags in all the openstack swift dependencies (lots of packages). I normally don't want the object support, so just install the nfs-ganesha, samba and zimon packages (plus rsync and python-ldap which I've figured out are needed). But, please beware that rhel7.2 isn't supported with v4.2.0 CES, and I've seen kernel crashes triggered by samba when ignoring that.. -jf ons. 13. jul. 2016 kl. 18.08 skrev Simon Thompson (Research Computing - IT Services) : > > The spectrumscale-protocols rpm (I think that was it) should include all > the os dependencies you need for the various ss bits. > > If you were adding the ss rpms by hand, then there are packages you need > to include. Unfortunately the protocols rpm adds all the protocols whether > you want them or not from what I remember. > > Simon > ________________________________________ > From: gpfsug-discuss-bounces at spectrumscale.org [ > gpfsug-discuss-bounces at spectrumscale.org] on behalf of Sobey, Richard A [ > r.sobey at imperial.ac.uk] > Sent: 13 July 2016 14:59 > To: 'gpfsug-discuss at spectrumscale.org' > Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts > > Hi all > > Where can I find documentation on how to prepare RHEL 7.2 for an > installation of SS 4.2 which will be a CES server? Is a ?minimal? install > (yes, install using the GUI, don?t shoot me) sufficient or should I choose > a different canned option. > > Thanks > > Richard > PS I tried looking in the FAQ > http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html > but I couldn?t find anything specific to this. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu Jul 14 08:55:33 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 14 Jul 2016 07:55:33 +0000 Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts In-Reply-To: References: Message-ID: Aha. I naively thought it would be. It?s no problem to use 7.1. Thanks for the heads up, and the responses. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jan-Frode Myklebust Sent: 13 July 2016 18:09 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts The gpfs.protocols package drags in all the openstack swift dependencies (lots of packages). I normally don't want the object support, so just install the nfs-ganesha, samba and zimon packages (plus rsync and python-ldap which I've figured out are needed). But, please beware that rhel7.2 isn't supported with v4.2.0 CES, and I've seen kernel crashes triggered by samba when ignoring that.. -jf ons. 13. jul. 2016 kl. 18.08 skrev Simon Thompson (Research Computing - IT Services) >: The spectrumscale-protocols rpm (I think that was it) should include all the os dependencies you need for the various ss bits. If you were adding the ss rpms by hand, then there are packages you need to include. Unfortunately the protocols rpm adds all the protocols whether you want them or not from what I remember. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Sobey, Richard A [r.sobey at imperial.ac.uk] Sent: 13 July 2016 14:59 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Hi all Where can I find documentation on how to prepare RHEL 7.2 for an installation of SS 4.2 which will be a CES server? Is a ?minimal? install (yes, install using the GUI, don?t shoot me) sufficient or should I choose a different canned option. Thanks Richard PS I tried looking in the FAQ http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html but I couldn?t find anything specific to this. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylorm at us.ibm.com Fri Jul 15 18:41:27 2016 From: taylorm at us.ibm.com (Michael L Taylor) Date: Fri, 15 Jul 2016 10:41:27 -0700 Subject: [gpfsug-discuss] gpfsug-discuss Sample RHEL 7.2 config Spectrum Scale install toolkit In-Reply-To: References: Message-ID: Hi Richard, The Knowledge Center should help guide you to prepare a RHEL7 node for installation with the /usr/lpp/mmfs/4.2.0.x/installer/spectrumscale install toolkit being a good way to install CES and all of its prerequisites: http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_loosein.htm For a high level quick overview of the install toolkit: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Protocols%20Quick%20Overview%20for%20IBM%20Spectrum%20Scale As mentioned, RHEL7.2 will be supported with CES with the 4.2.1 release due out shortly.... RHEL7.1 on 4.2 will work. Today's Topics: 1. Re: Aggregating filesystem performance (Oesterlin, Robert) (Brian Marshall) 2. Sample RHEL 7.2 config / anaconda scripts (Sobey, Richard A) 3. Re: Aggregating filesystem performance (Oesterlin, Robert) 4. Re: Sample RHEL 7.2 config / anaconda scripts (Simon Thompson (Research Computing - IT Services)) 5. Re: Aggregating filesystem performance (Buterbaugh, Kevin L) Message: 4 Date: Wed, 13 Jul 2016 16:08:32 +0000 From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Message-ID: Content-Type: text/plain; charset="Windows-1252" The spectrumscale-protocols rpm (I think that was it) should include all the os dependencies you need for the various ss bits. If you were adding the ss rpms by hand, then there are packages you need to include. Unfortunately the protocols rpm adds all the protocols whether you want them or not from what I remember. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Sobey, Richard A [r.sobey at imperial.ac.uk] Sent: 13 July 2016 14:59 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Sample RHEL 7.2 config / anaconda scripts Hi all Where can I find documentation on how to prepare RHEL 7.2 for an installation of SS 4.2 which will be a CES server? Is a ?minimal? install (yes, install using the GUI, don?t shoot me) sufficient or should I choose a different canned option. Thanks Richard PS I tried looking in the FAQ http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html but I couldn?t find anything specific to this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Sun Jul 17 02:04:39 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Sat, 16 Jul 2016 21:04:39 -0400 Subject: [gpfsug-discuss] segment size and sub-block size Message-ID: All, When picking blockSize and segmentSize on RAID6 8+2 LUNs, I have see 2 optimal theories. 1) Make blockSize = # Data Disks * segmentSize e.g. in the RAID6 8+2 case, 8 MB blockSize = 8 * 1 MB segmentSize This makes sense to me as every GPFS block write is a full stripe write 2) Make blockSize = 32 (number sub blocks) * segmentSize; also make sure the blockSize is a multiple of #data disks * segmentSize I don't know enough about GPFS to know how subblocks interact and what tradeoffs this makes. Can someone explain (or point to a doc) about sub block mechanics and when to optimize for that? Thank you, Brian Marshall -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sun Jul 17 02:20:31 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sat, 16 Jul 2016 21:20:31 -0400 Subject: [gpfsug-discuss] segment size and sub-block size In-Reply-To: References: Message-ID: <9287130c-70ba-207c-221d-f236bad8acaf@nasa.gov> Hi Brian, We use a 128KB segment size on our DDNs and a 1MB block size and it works quite well for us (throughput in the 10's of gigabytes per second). IIRC the sub block (blockSize/32) is the smallest unit of allocatable disk space. If that's not tuned well to your workload you can end up with a lot of wasted space on the filesystem. In option #1, the smallest unit of allocatable space is 256KB. If you have millions of files that are say 8K in size you can do the math on lost space. In option #2, if you're using the same 1MB segment size from the option 1 scenario it gets even worse. Hope that helps. This might also help (https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_frags.htm). -Aaron On 7/16/16 9:04 PM, Brian Marshall wrote: > All, > > When picking blockSize and segmentSize on RAID6 8+2 LUNs, I have see 2 > optimal theories. > > > 1) Make blockSize = # Data Disks * segmentSize > e.g. in the RAID6 8+2 case, 8 MB blockSize = 8 * 1 MB segmentSize > > This makes sense to me as every GPFS block write is a full stripe write > > 2) Make blockSize = 32 (number sub blocks) * segmentSize; also make sure > the blockSize is a multiple of #data disks * segmentSize > > I don't know enough about GPFS to know how subblocks interact and what > tradeoffs this makes. > > Can someone explain (or point to a doc) about sub block mechanics and > when to optimize for that? > > Thank you, > Brian Marshall > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From mimarsh2 at vt.edu Sun Jul 17 03:56:14 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Sat, 16 Jul 2016 22:56:14 -0400 Subject: [gpfsug-discuss] SSD LUN setup Message-ID: When setting up SSDs to be used as a fast tier storage pool, are people still doing RAID6 LUNs? I think write endurance is good enough now that this is no longer a big concern (maybe a small concern). I could be wrong. I have read about other products doing RAID1 with deduplication and compression to take less than the 50% capacity hit. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Sun Jul 17 14:05:35 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Sun, 17 Jul 2016 13:05:35 +0000 Subject: [gpfsug-discuss] SSD LUN setup Message-ID: <238458C4-E9C7-4065-9716-C3B38E52445D@nuance.com> Thinly provisioned (compressed) metadata volumes is unsupported according to IBM. See the GPFS FAQ here, question 4.12: "Placing GPFS metadata on an NSD backed by a thinly provisioned volume is dangerous and unsupported." http://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Saturday, July 16, 2016 at 9:56 PM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] SSD LUN setup I have read about other products doing RAID1 with deduplication and compression to take less than the 50% capacity hit. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Sun Jul 17 15:21:13 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Sun, 17 Jul 2016 10:21:13 -0400 Subject: [gpfsug-discuss] SSD LUN setup In-Reply-To: <238458C4-E9C7-4065-9716-C3B38E52445D@nuance.com> References: <238458C4-E9C7-4065-9716-C3B38E52445D@nuance.com> Message-ID: That's very good advice. In my specific case, I am looking at lowlevel setup of the NSDs in a SSD storage pool with metadata stored elsewhere (on another SSD system). I am wondering if stuff like SSD pagepool size comes into play or if I just look at the segment size from the storage enclosure RAID controller. It sounds like SSDs should be used just like HDDs: group them into RAID6 LUNs. Write endurance is good enough now that longevity is not a problem and there are plenty of IOPs to do parity work. Does this sound right? Anyone doing anything else? Brian On Sun, Jul 17, 2016 at 9:05 AM, Oesterlin, Robert < Robert.Oesterlin at nuance.com> wrote: > Thinly provisioned (compressed) metadata volumes is unsupported according > to IBM. See the GPFS FAQ here, question 4.12: > > > > "Placing GPFS metadata on an NSD backed by a thinly provisioned volume is > dangerous and unsupported." > > > > http://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html > > > > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > 507-269-0413 > > > > > > *From: * on behalf of Brian > Marshall > *Reply-To: *gpfsug main discussion list > *Date: *Saturday, July 16, 2016 at 9:56 PM > *To: *"gpfsug-discuss at spectrumscale.org" > > *Subject: *[EXTERNAL] [gpfsug-discuss] SSD LUN setup > > > > I have read about other products doing RAID1 with deduplication and > compression to take less than the 50% capacity hit. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Sun Jul 17 22:49:53 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Sun, 17 Jul 2016 22:49:53 +0100 Subject: [gpfsug-discuss] SSD LUN setup In-Reply-To: References: Message-ID: On 17/07/16 03:56, Brian Marshall wrote: > When setting up SSDs to be used as a fast tier storage pool, are people > still doing RAID6 LUNs? I think write endurance is good enough now that > this is no longer a big concern (maybe a small concern). I could be wrong. > > I have read about other products doing RAID1 with deduplication and > compression to take less than the 50% capacity hit. > There are plenty of ways in which an SSD can fail that does not involve problems with write endurance. The idea of using any disks in anything other than a test/dev GPFS file system that you simply don't care about if it goes belly up, that are not RAID or similarly protected is in my view fool hardy in the extreme. It would be like saying that HDD's can only fail due to surface defects on the platers, and then getting stung when the drive motor fails or the drive electronics stop working or better yet the drive electrics go puff literately in smoke and there is scorch marks on the PCB. Or how about a drive firmware issue that causes them to play dead under certain work loads, or drive firmware issues that just cause them to die prematurely in large numbers. These are all failure modes I have personally witnessed. My sample size for SSD's is still way to small to have seen lots of wacky failure modes, but I don't for one second believe that given time I won't see them. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Greg.Lehmann at csiro.au Mon Jul 18 00:23:09 2016 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Sun, 17 Jul 2016 23:23:09 +0000 Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale Message-ID: Hi All, Given the issues with supporting RHEL 7.2 I am wondering about the latest SLES release and support. Is anybody running actually running it on SLES 12 SP1. I've seen reference to a kernel version that is in SLES 12 SP1, but I'm not sure I trust it as the same document also says RHEL 7.2 is supported for SS 4.2. http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Cheers, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Jul 18 01:39:29 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 18 Jul 2016 00:39:29 +0000 Subject: [gpfsug-discuss] Aggregating filesystem performance Message-ID: OK, after a bit of a delay due to a hectic travel week, here is some more information on my GPFS performance collection. At the bottom, I have links to my server and client zimon config files and a link to my presentation at SSUG Argonne in June. I didn't actually present it but included it in case there was interest. I used to do a home brew system of period calls to mmpmon to collect data, sticking them into a kafka database. This was a bit cumbersome and when SS 4.2 arrived, I switched over to the built in performance sensors (zimon) to collect the data. IBM has a "as-is" bridge between Grafana and the Zimon collector that works reasonably well - they were supposed to release it but it's been delayed - I will ask about it again and post more information if I get it. My biggest struggle with the zimon configuration is the large memory requirement of the collector with large clusters (many clients, file systems, NSDs). I ended up deploying a 6 collector federation of 16gb per collector for my larger clusters -0 even then I have to limit the number of stats and amount of time I retain it. IBM is aware of the memory issue and I believe they are looking at ways to reduce it. As for what specific metrics I tend to look at: gpfs_fis_bytes_read (written) - aggregated file system read and write stats gpfs_nsdpool_bytes_read (written) - aggregated pool stats, as I have data and metadata split gpfs_fs_tot_disk_wait_rd (wr) - NSD disk wait stats These seem to make the most sense for me to get an overall sense of how things are going. I have a bunch of other more details dashboards for individual file systems and clients that help me get details. The built-in SS GUI is pretty good for small clusters, and is getting some improvements in 4.2.1 that might make me take a closer look at it again. I also look at the RPC waiters stats - no present in 4.2.0 grafana, but I hear are coming in 4.2.1 My SSUG Argonne Presentation (I didn't talk due to time constraints): http://files.gpfsug.org/presentations/2016/anl-june/SSUG_Nuance_PerfTools.pdf Zimon server config file: https://www.dropbox.com/s/gvtfhhqfpsknfnh/ZIMonSensors.cfg.server?dl=0 Zimon client config file: https://www.dropbox.com/s/k5i6rcnaco4vxu6/ZIMonSensors.cfg.client?dl=0 Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid From: on behalf of Brian Marshall Reply-To: gpfsug main discussion list Date: Wednesday, July 13, 2016 at 8:43 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] Re: [gpfsug-discuss] Aggregating filesystem performance (Oesterlin, Robert) Robert, 1) Do you see any noticeable performance impact by running the performance monitoring? 2) Can you share the zimon configuration that you use? i.e. what metrics do you find most useful? Thank you, Brian Marshall -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Mon Jul 18 15:07:51 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Mon, 18 Jul 2016 10:07:51 -0400 Subject: [gpfsug-discuss] SSD LUN setup In-Reply-To: References: Message-ID: @Jonathan, I completely agree on the SSD failure. I wasn't suggesting that better write endurance made them impervious to failures, just that I read a few articles from ~3-5 years back saying that RAID5 or RAID6 would destroy your SSDs and have a really high probability of all SSDs failing at the same time as the # of writes were equal on all SSDs in the RAID group. I think that's no longer the case and RAID6 on SSDs is fine. I was looking for examples of what others have done: RAID6, using GPFS data replicas, or some other thing I don't know about that better takes advantage of SSD architecture. Background - I am a storage noob Also is the @Jonathan proper list etiquette? Thanks everyone to great advice I've been getting. Thank you, Brian On Sun, Jul 17, 2016 at 5:49 PM, Jonathan Buzzard wrote: > On 17/07/16 03:56, Brian Marshall wrote: > >> When setting up SSDs to be used as a fast tier storage pool, are people >> still doing RAID6 LUNs? I think write endurance is good enough now that >> this is no longer a big concern (maybe a small concern). I could be >> wrong. >> >> I have read about other products doing RAID1 with deduplication and >> compression to take less than the 50% capacity hit. >> >> > There are plenty of ways in which an SSD can fail that does not involve > problems with write endurance. The idea of using any disks in anything > other than a test/dev GPFS file system that you simply don't care about if > it goes belly up, that are not RAID or similarly protected is in my view > fool hardy in the extreme. > > It would be like saying that HDD's can only fail due to surface defects on > the platers, and then getting stung when the drive motor fails or the drive > electronics stop working or better yet the drive electrics go puff > literately in smoke and there is scorch marks on the PCB. Or how about a > drive firmware issue that causes them to play dead under certain work > loads, or drive firmware issues that just cause them to die prematurely in > large numbers. > > These are all failure modes I have personally witnessed. My sample size > for SSD's is still way to small to have seen lots of wacky failure modes, > but I don't for one second believe that given time I won't see them. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From UWEFALKE at de.ibm.com Mon Jul 18 18:34:38 2016 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Mon, 18 Jul 2016 19:34:38 +0200 Subject: [gpfsug-discuss] SSD LUN setup In-Reply-To: References: Message-ID: Hi Brian, write endurance is one thing you need to run small IOs on on RAID5/RAID6. However, while SSDs are much faster than HDDs when it comes to reads, they are just faster when it comes to writes. The RMW penalty on small writes to RAID5 / RAID6 will incur a higher actual data write rate at your SSD devices than you see going from your OS / file system to the storage. How much higher depends on the actual IO sizes to the RAID device related to your full stripe widths. Mind that the write caches on all levels will help here getting the the IOs larger than what the application does. Beyond a certain point, however, if you go to smaller and smaller IOs (in relation to your stripe widths) you might want to look for some other redundancy code than RAID5/RAID6 or related parity-using mechanisms even if you pay the capacity price of simple data replication (RAID1, or 3w in GNR). That depends of course, but is worth a consideration. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Brian Marshall To: gpfsug main discussion list Date: 07/18/2016 04:08 PM Subject: Re: [gpfsug-discuss] SSD LUN setup Sent by: gpfsug-discuss-bounces at spectrumscale.org @Jonathan, I completely agree on the SSD failure. I wasn't suggesting that better write endurance made them impervious to failures, just that I read a few articles from ~3-5 years back saying that RAID5 or RAID6 would destroy your SSDs and have a really high probability of all SSDs failing at the same time as the # of writes were equal on all SSDs in the RAID group. I think that's no longer the case and RAID6 on SSDs is fine. I was looking for examples of what others have done: RAID6, using GPFS data replicas, or some other thing I don't know about that better takes advantage of SSD architecture. Background - I am a storage noob Also is the @Jonathan proper list etiquette? Thanks everyone to great advice I've been getting. Thank you, Brian On Sun, Jul 17, 2016 at 5:49 PM, Jonathan Buzzard wrote: On 17/07/16 03:56, Brian Marshall wrote: When setting up SSDs to be used as a fast tier storage pool, are people still doing RAID6 LUNs? I think write endurance is good enough now that this is no longer a big concern (maybe a small concern). I could be wrong. I have read about other products doing RAID1 with deduplication and compression to take less than the 50% capacity hit. There are plenty of ways in which an SSD can fail that does not involve problems with write endurance. The idea of using any disks in anything other than a test/dev GPFS file system that you simply don't care about if it goes belly up, that are not RAID or similarly protected is in my view fool hardy in the extreme. It would be like saying that HDD's can only fail due to surface defects on the platers, and then getting stung when the drive motor fails or the drive electronics stop working or better yet the drive electrics go puff literately in smoke and there is scorch marks on the PCB. Or how about a drive firmware issue that causes them to play dead under certain work loads, or drive firmware issues that just cause them to die prematurely in large numbers. These are all failure modes I have personally witnessed. My sample size for SSD's is still way to small to have seen lots of wacky failure modes, but I don't for one second believe that given time I won't see them. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Tue Jul 19 08:59:43 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 19 Jul 2016 07:59:43 +0000 Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale Message-ID: I thought it was supported, but that CES (Integrated protocols support) is only supported up to 7.1 Simon From: > on behalf of "Greg.Lehmann at csiro.au" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 18 July 2016 at 00:23 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale Hi All, Given the issues with supporting RHEL 7.2 I am wondering about the latest SLES release and support. Is anybody running actually running it on SLES 12 SP1. I?ve seen reference to a kernel version that is in SLES 12 SP1, but I?m not sure I trust it as the same document also says RHEL 7.2 is supported for SS 4.2. http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Cheers, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Wed Jul 20 01:17:23 2016 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 20 Jul 2016 00:17:23 +0000 Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale In-Reply-To: References: Message-ID: You are right. An IBMer cleared it up for me. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, 19 July 2016 6:00 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale I thought it was supported, but that CES (Integrated protocols support) is only supported up to 7.1 Simon From: > on behalf of "Greg.Lehmann at csiro.au" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Monday, 18 July 2016 at 00:23 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] SLES 12 SP1 support for Spectrum Scale Hi All, Given the issues with supporting RHEL 7.2 I am wondering about the latest SLES release and support. Is anybody running actually running it on SLES 12 SP1. I've seen reference to a kernel version that is in SLES 12 SP1, but I'm not sure I trust it as the same document also says RHEL 7.2 is supported for SS 4.2. http://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html Cheers, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Jul 20 13:21:19 2016 From: bevans at pixitmedia.com (Barry Evans) Date: Wed, 20 Jul 2016 13:21:19 +0100 Subject: [gpfsug-discuss] New AFM Toys Message-ID: Just noticed this in the 4.2.0-4 release notes: * Fix the readdir performance issue of independent writer mode filesets in the AFM environment. Introduce a new configuration option afmDIO at the fileset level to replicate data from cache to home using direct IO. Before I start weeping tears of joy, does anyone have any further info on this (the issue and the new parameter?) Does this apply to both NFS and GPFS transpots? It looks very promising! -- Barry Evans Technical Director & Co-Founder Pixit Media Mobile: +44 (0)7950 666 248 http://www.pixitmedia.com -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Jul 20 15:42:09 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 20 Jul 2016 14:42:09 +0000 Subject: [gpfsug-discuss] Migrating to CES from CTDB Message-ID: Hi all, Does anyone have any experience migrating from CTDB and GPFS 3.5 to CES and GPFS 4.2? We've got a plan of how to do it, but the issue is doing it without causing any downtime to the front end. We're using "secrets and keytab" for auth in smb.conf. So the only way I think we can do it is build out the 4.2 servers and somehow integrate them into the existing cluster (front end cluster) - or more accurately - keep the same FQDN of the cluster and just change DNS to point the FDQN to the new servers, and remove it from the existing ones. The big question is: will this work in theory? The downtime option involves shutting down the CTDB cluster, deleting the AD object, renaming the new cluster and starting CES SMB to allow it to join AD with the same name. This takes about 15 minutes while allowing for AD replication and the TTL on the DNS. This then has to be repeated when the original CTDB nodes have been reinstalled. I feel like I'm rambling but I just can't find any guides on migrating protocols from CTDB to CES, just migrations of GPFS itself. Plus my knowledge of samba isn't all that :) Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Jul 20 16:15:39 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 20 Jul 2016 16:15:39 +0100 Subject: [gpfsug-discuss] Migrating to CES from CTDB In-Reply-To: References: Message-ID: <1469027739.26989.8.camel@buzzard.phy.strath.ac.uk> On Wed, 2016-07-20 at 14:42 +0000, Sobey, Richard A wrote: [SNIP] > > The downtime option involves shutting down the CTDB cluster, deleting > the AD object, renaming the new cluster and starting CES SMB to allow > it to join AD with the same name. This takes about 15 minutes while > allowing for AD replication and the TTL on the DNS. This then has to > be repeated when the original CTDB nodes have been reinstalled. > Can you not reduce the TTL on the DNS to as low as possible prior to the changeover to reduce the required downtime for the switch over? You are also aware that you can force the AD replication so no need to wait for that, other than the replication time, which should be pretty quick? I also believe that it is not necessary to delete the AD object. Just leave it as is, and it will be overwritten when you join the new CES cluster. Saves you a step. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From r.sobey at imperial.ac.uk Wed Jul 20 16:23:00 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 20 Jul 2016 15:23:00 +0000 Subject: [gpfsug-discuss] Migrating to CES from CTDB In-Reply-To: <1469027739.26989.8.camel@buzzard.phy.strath.ac.uk> References: <1469027739.26989.8.camel@buzzard.phy.strath.ac.uk> Message-ID: I was thinking of that. Current TTL is 900s, we can probably lower it on a temporary basis to facilitate the change. I wasn't aware about the AD object, no... I presume the existing object will simply be updated when the new cluster joins, which in turn will trigger a replication of it anyway? Thanks Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathan Buzzard Sent: 20 July 2016 16:16 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Migrating to CES from CTDB On Wed, 2016-07-20 at 14:42 +0000, Sobey, Richard A wrote: [SNIP] > > The downtime option involves shutting down the CTDB cluster, deleting > the AD object, renaming the new cluster and starting CES SMB to allow > it to join AD with the same name. This takes about 15 minutes while > allowing for AD replication and the TTL on the DNS. This then has to > be repeated when the original CTDB nodes have been reinstalled. > Can you not reduce the TTL on the DNS to as low as possible prior to the changeover to reduce the required downtime for the switch over? You are also aware that you can force the AD replication so no need to wait for that, other than the replication time, which should be pretty quick? I also believe that it is not necessary to delete the AD object. Just leave it as is, and it will be overwritten when you join the new CES cluster. Saves you a step. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Mark.Bush at siriuscom.com Wed Jul 20 18:23:32 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 17:23:32 +0000 Subject: [gpfsug-discuss] More fun with Policies Message-ID: Can anyone explain why I can?t apply a policy in the GUI any longer? Here?s my screenshot of what I?m presented with now after I created a policy from the CLI [cid:image001.png at 01D1E281.87334DC0] See how the editor is missing as well as the typical buttons on the bottom of the Leftmost column? Also how does one delete a policy? There?s no mmdelpolicy Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 30212 bytes Desc: image001.png URL: From jamiedavis at us.ibm.com Wed Jul 20 19:17:09 2016 From: jamiedavis at us.ibm.com (James Davis) Date: Wed, 20 Jul 2016 18:17:09 +0000 Subject: [gpfsug-discuss] More fun with Policies In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.image001.png at 01D1E281.87334DC0.png Type: image/png Size: 30212 bytes Desc: not available URL: From Mark.Bush at siriuscom.com Wed Jul 20 19:24:02 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 18:24:02 +0000 Subject: [gpfsug-discuss] More fun with Policies In-Reply-To: References: Message-ID: <09C9ADED-936E-4779-83D7-4D9A008E4302@siriuscom.com> Thanks James. I did just that (running 4.2.0.3). mmchpolicy fs1 DEFAULT. It didn?t fix the gui however I wonder if it?s a bug in the gui code or something like that. From: on behalf of James Davis Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 1:17 PM To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] More fun with Policies Hi Mark, I don't have an answer about the GUI change, but I believe as of 4.1 you can "delete" a policy by using mmchpolicy like this: #14:15:36# c42an3:~ # mmchpolicy c42_fs2_dmapi DEFAULT GPFS: 6027-2809 Validated policy 'DEFAULT': GPFS: 6027-799 Policy `DEFAULT' installed and broadcast to all nodes. #14:16:06# c42an3:~ # mmlspolicy c42_fs2_dmapi -L /* DEFAULT */ /* Store data in the first data pool or system pool */ If your release doesn't support that, try a simple policy like RULE 'default' SET POOL 'system' or RULE 'default' SET POOL '' Cheers, Jamie Jamie Davis GPFS Functional Verification Test (FVT) jamiedavis at us.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] More fun with Policies Date: Wed, Jul 20, 2016 1:24 PM Can anyone explain why I can?t apply a policy in the GUI any longer? Here?s my screenshot of what I?m presented with now after I created a policy from the CLI [cid:image001.png at 01D1E289.FAA9E450] See how the editor is missing as well as the typical buttons on the bottom of the Leftmost column? Also how does one delete a policy? There?s no mmdelpolicy Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 30213 bytes Desc: image001.png URL: From Mark.Bush at siriuscom.com Wed Jul 20 19:45:41 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 18:45:41 +0000 Subject: [gpfsug-discuss] More fun with Policies In-Reply-To: <09C9ADED-936E-4779-83D7-4D9A008E4302@siriuscom.com> References: <09C9ADED-936E-4779-83D7-4D9A008E4302@siriuscom.com> Message-ID: I killed my browser cache and all is well now. From: on behalf of Mark Bush Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 1:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] More fun with Policies Thanks James. I did just that (running 4.2.0.3). mmchpolicy fs1 DEFAULT. It didn?t fix the gui however I wonder if it?s a bug in the gui code or something like that. From: on behalf of James Davis Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 1:17 PM To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] More fun with Policies Hi Mark, I don't have an answer about the GUI change, but I believe as of 4.1 you can "delete" a policy by using mmchpolicy like this: #14:15:36# c42an3:~ # mmchpolicy c42_fs2_dmapi DEFAULT GPFS: 6027-2809 Validated policy 'DEFAULT': GPFS: 6027-799 Policy `DEFAULT' installed and broadcast to all nodes. #14:16:06# c42an3:~ # mmlspolicy c42_fs2_dmapi -L /* DEFAULT */ /* Store data in the first data pool or system pool */ If your release doesn't support that, try a simple policy like RULE 'default' SET POOL 'system' or RULE 'default' SET POOL '' Cheers, Jamie Jamie Davis GPFS Functional Verification Test (FVT) jamiedavis at us.ibm.com ----- Original message ----- From: "Mark.Bush at siriuscom.com" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] More fun with Policies Date: Wed, Jul 20, 2016 1:24 PM Can anyone explain why I can?t apply a policy in the GUI any longer? Here?s my screenshot of what I?m presented with now after I created a policy from the CLI [cid:image001.png at 01D1E28D.0011FCE0] See how the editor is missing as well as the typical buttons on the bottom of the Leftmost column? Also how does one delete a policy? There?s no mmdelpolicy Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 30214 bytes Desc: image001.png URL: From Mark.Bush at siriuscom.com Wed Jul 20 21:47:13 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 20:47:13 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario Message-ID: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenh at us.ibm.com Wed Jul 20 22:15:49 2016 From: kenh at us.ibm.com (Ken Hill) Date: Wed, 20 Jul 2016 17:15:49 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone: 1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Wed Jul 20 22:27:03 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 20 Jul 2016 21:27:03 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Hi Mark, We do this. We have sync replication between two sites with extended san and Ethernet fabric between them. We then use copies=2 for both metadata and data (most filesets). We then also have a vm quorum node which runs on VMware in a fault tolerant cluster. We tested split braining the sites before we went into production. It does work, but we did find some interesting failure modes doing the testing, so do that and push it hard. We multicluster our ces nodes (yes technically I know isn't supported), and again have a quorum vm which has dc affinity to the storage cluster one to ensure ces fails to the same DC. You may also want to look at readReplicaPolicy=local and Infiniband fabric numbers, and probably subnets to ensure your clients prefer the local site for read. Write of course needs enough bandwidth between sites to keep it fast. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Mark.Bush at siriuscom.com [Mark.Bush at siriuscom.com] Sent: 20 July 2016 21:47 To: gpfsug main discussion list Subject: [gpfsug-discuss] NDS in Two Site scenario For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush | Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions From makaplan at us.ibm.com Wed Jul 20 22:52:25 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 20 Jul 2016 17:52:25 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Careful! You need to plan and test, test and plan both failure scenarios and performance under high network loads. I don't believe GPFS was designed with the idea of splitting clusters over multiple sites. If your inter-site network runs fast enough, and you can administer it well enough -- perhaps it will work well enough... Hint: Think about the what the words "cluster" and "site" mean. GPFS does have the AFM feature, which was designed for multi-site deployments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Wed Jul 20 23:34:53 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Wed, 20 Jul 2016 23:34:53 +0100 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: <3e1dc902-ca52-4ab1-2ca3-e51ba3f18b32@buzzard.me.uk> On 20/07/16 22:15, Ken Hill wrote: [SNIP] > You can further isolate failure by increasing quorum (odd numbers). > > The way quorum works is: The majority of the quorum nodes need to be up > to survive an outage. > > - With 3 quorum nodes you can have 1 quorum node failures and continue > filesystem operations. > - With 5 quorum nodes you can have 2 quorum node failures and continue > filesystem operations. > - With 7 quorum nodes you can have 3 quorum node failures and continue > filesystem operations. > - etc > The alternative is a tiebreaker disk to prevent split brain clusters. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Mark.Bush at siriuscom.com Thu Jul 21 00:33:06 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 23:33:06 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kcfor more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image001.png at 01D1E2B5.280C2EA0] [cid:image002.png at 01D1E2B5.280C2EA0] [cid:image003.png at 01D1E2B5.280C2EA0] [cid:image004.png at 01D1E2B5.280C2EA0] [cid:image005.png at 01D1E2B5.280C2EA0] [cid:image006.png at 01D1E2B5.280C2EA0] [cid:image007.png at 01D1E2B5.280C2EA0] [cid:image008.png at 01D1E2B5.280C2EA0] [cid:image009.png at 01D1E2B5.280C2EA0] [cid:image010.png at 01D1E2B5.280C2EA0] [cid:image011.png at 01D1E2B5.280C2EA0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1621 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1597 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1072 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 979 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1564 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1313 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 1168 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 1426 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1369 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 1244 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 4454 bytes Desc: image011.png URL: From Mark.Bush at siriuscom.com Thu Jul 21 00:34:29 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Wed, 20 Jul 2016 23:34:29 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Marc, what you are saying is anything outside a particular data center shouldn?t be part of a cluster? I?m not sure marketing is in line with this then. From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:52 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Careful! You need to plan and test, test and plan both failure scenarios and performance under high network loads. I don't believe GPFS was designed with the idea of splitting clusters over multiple sites. If your inter-site network runs fast enough, and you can administer it well enough -- perhaps it will work well enough... Hint: Think about the what the words "cluster" and "site" mean. GPFS does have the AFM feature, which was designed for multi-site deployments. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenh at us.ibm.com Thu Jul 21 01:02:01 2016 From: kenh at us.ibm.com (Ken Hill) Date: Wed, 20 Jul 2016 20:02:01 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone: 1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1621 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1597 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1072 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 979 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1564 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1426 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1369 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1244 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4454 bytes Desc: not available URL: From YARD at il.ibm.com Thu Jul 21 05:48:09 2016 From: YARD at il.ibm.com (Yaron Daniel) Date: Thu, 21 Jul 2016 07:48:09 +0300 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: HI U must remember the following: Network vlan should be the same between 2 Main Sites - since the CES IP failover will not work... U can define : Site1 - 2 x NSD servers + Quorum Site2 - 2 x NSD servers + Quorum GPFS FS replication define with failure groups. (Latency must be very low in order to have write performance). Site3 - 1 x Quorum + Local disk as Tie Breaker Disk. (Desc Only) Hope this help. Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com IBM Israel From: "Ken Hill" To: gpfsug main discussion list Date: 07/21/2016 03:02 AM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1596 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 978 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1563 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1167 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1425 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1368 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1243 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4453 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1621 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1597 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1072 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 979 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1564 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1426 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1369 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1244 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4454 bytes Desc: not available URL: From ashish.thandavan at cs.ox.ac.uk Thu Jul 21 11:26:02 2016 From: ashish.thandavan at cs.ox.ac.uk (Ashish Thandavan) Date: Thu, 21 Jul 2016 11:26:02 +0100 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience Message-ID: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> Dear all, Please could anyone be able to point me at specifications required for the GPFS heartbeat network? Are there any figures for latency, jitter, etc that one should be aware of? I also have a related question about resilience. Our three GPFS NSD servers utilize a single network port on each server and communicate heartbeat traffic over a private VLAN. We are looking at improving the resilience of this setup by adding an additional network link on each server (going to a different member of a pair of stacked switches than the existing one) and running the heartbeat network over bonded interfaces on the three servers. Are there any recommendations as to which network bonding type to use? Based on the name alone, Mode 1 (active-backup) appears to be the ideal choice, and I believe the switches do not need any special configuration. However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might be the way to go; this aggregates the two ports and does require the relevant switch ports to be configured to support this. Is there a recommended bonding mode? If anyone here currently uses bonded interfaces for their GPFS heartbeat traffic, may I ask what type of bond have you configured? Have you had any problems with the setup? And more importantly, has it been of use in keeping the cluster up and running in the scenario of one network link going down? Thank you, Regards, Ash -- ------------------------- Ashish Thandavan UNIX Support Computing Officer Department of Computer Science University of Oxford Wolfson Building Parks Road Oxford OX1 3QD Phone: 01865 610733 Email: ashish.thandavan at cs.ox.ac.uk From Mark.Bush at siriuscom.com Thu Jul 21 13:45:12 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 21 Jul 2016 12:45:12 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. Site1 NSD Node1 ---NSD1 ---Physical LUN1 from SAN1 NSD Node2 Site2 NSD Node3 ---NSD2 ?Physical LUN2 from SAN2 NSD Node4 Or Site1 NSD Node1 ----NSD1 ?Physical LUN1 from SAN1 ----NSD2 ?Physical LUN2 from SAN2 NSD Node2 Site 2 NSD Node3 ---NSD2 ? Physical LUN2 from SAN2 ---NSD1 --Physical LUN1 from SAN1 NSD Node4 Site 3 Node5 Quorum From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 7:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image001.png at 01D1E323.CF61AAE0] [cid:image002.png at 01D1E323.CF61AAE0] [cid:image003.png at 01D1E323.CF61AAE0] [cid:image004.png at 01D1E323.CF61AAE0] [cid:image005.png at 01D1E323.CF61AAE0] [cid:image006.png at 01D1E323.CF61AAE0] [cid:image007.png at 01D1E323.CF61AAE0] [cid:image008.png at 01D1E323.CF61AAE0] [cid:image009.png at 01D1E323.CF61AAE0] [cid:image010.png at 01D1E323.CF61AAE0] [cid:image011.png at 01D1E323.CF61AAE0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kcfor more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image012.png at 01D1E323.CF61AAE0] [cid:image013.png at 01D1E323.CF61AAE0] [cid:image014.png at 01D1E323.CF61AAE0] [cid:image015.png at 01D1E323.CF61AAE0] [cid:image016.png at 01D1E323.CF61AAE0] [cid:image017.png at 01D1E323.CF61AAE0] [cid:image018.png at 01D1E323.CF61AAE0] [cid:image019.png at 01D1E323.CF61AAE0] [cid:image020.png at 01D1E323.CF61AAE0] [cid:image021.png at 01D1E323.CF61AAE0] [cid:image022.png at 01D1E323.CF61AAE0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1621 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1597 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1072 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 979 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1564 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1313 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 1168 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 1426 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1369 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 1244 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 4454 bytes Desc: image011.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image012.png Type: image/png Size: 1622 bytes Desc: image012.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image013.png Type: image/png Size: 1598 bytes Desc: image013.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image014.png Type: image/png Size: 1073 bytes Desc: image014.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image015.png Type: image/png Size: 980 bytes Desc: image015.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image016.png Type: image/png Size: 1565 bytes Desc: image016.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image017.png Type: image/png Size: 1314 bytes Desc: image017.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image018.png Type: image/png Size: 1169 bytes Desc: image018.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image019.png Type: image/png Size: 1427 bytes Desc: image019.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image020.png Type: image/png Size: 1370 bytes Desc: image020.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image021.png Type: image/png Size: 1245 bytes Desc: image021.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image022.png Type: image/png Size: 4455 bytes Desc: image022.png URL: From jonathan at buzzard.me.uk Thu Jul 21 14:01:06 2016 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 21 Jul 2016 14:01:06 +0100 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: <1469106066.26989.33.camel@buzzard.phy.strath.ac.uk> On Thu, 2016-07-21 at 12:45 +0000, Mark.Bush at siriuscom.com wrote: > This is where my confusion sits. So if I have two sites, and two NDS > Nodes per site with 1 NSD (to keep it simple), do I just present the > physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to > Site2 NSD Nodes? Unless you are going to use a tiebreaker disk you need an odd number of NSD nodes. If you don't you risk a split brain cluster and well god only knows what will happen to your file system in such a scenario. > Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and > the same at Site2? (Assuming SAN and not direct attached in this > case). I know I?m being persistent but this for some reason confuses > me. That's one way of doing it assuming that you have extended your SAN across both sites. You present all LUN's to all NSD nodes regardless of which site they are at. With this method you can use a tiebreaker disk. Alternatively you present the LUN's at site one to the NSD servers at site one and all the LUN's at site two to the NSD servers at site two, and set failure and replication groups up appropriately. However in this scenario it is critical to have an odd number of NSD servers because you can only use tiebreaker disks where every NSD node can see the physical disk aka it's SAN attached (either FC or iSCSI) to all NSD nodes. That said as others have pointed out, beyond a metropolitan area network I can't see multi site GPFS working. You could I guess punt iSCSI over the internet but performance is going to be awful, and iSCSI and GPFS just don't mix in my experience. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From S.J.Thompson at bham.ac.uk Thu Jul 21 14:02:03 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 21 Jul 2016 13:02:03 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: It depends. What are you protecting against? Either will work depending on your acceptable failure modes. I'm assuming here that you are using copies=2 to replicate the data, and that the NSD devices have different failure groups per site. In the second example, if you were to lose the NSD servers in Site 1, but not the SAN, you would continue to have 2 copies of data written as the NSD servers in Site 2 could write to the SAN in Site 1. In the first example you would need to rest ripe the file-system when brining the Site 1 back online to ensure data is replicated.\ Simon From: > on behalf of "Mark.Bush at siriuscom.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 21 July 2016 at 13:45 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. Site1 NSD Node1 ---NSD1 ---Physical LUN1 from SAN1 NSD Node2 Site2 NSD Node3 ---NSD2 ?Physical LUN2 from SAN2 NSD Node4 Or Site1 NSD Node1 ----NSD1 ?Physical LUN1 from SAN1 ----NSD2 ?Physical LUN2 from SAN2 NSD Node2 Site 2 NSD Node3 ---NSD2 ? Physical LUN2 from SAN2 ---NSD1 --Physical LUN1 from SAN1 NSD Node4 Site 3 Node5 Quorum From: > on behalf of Ken Hill > Reply-To: gpfsug main discussion list > Date: Wednesday, July 20, 2016 at 7:02 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image001.png at 01D1E323.CF61AAE0] [cid:image002.png at 01D1E323.CF61AAE0] [cid:image003.png at 01D1E323.CF61AAE0] [cid:image004.png at 01D1E323.CF61AAE0] [cid:image005.png at 01D1E323.CF61AAE0] [cid:image006.png at 01D1E323.CF61AAE0] [cid:image007.png at 01D1E323.CF61AAE0] [cid:image008.png at 01D1E323.CF61AAE0] [cid:image009.png at 01D1E323.CF61AAE0] [cid:image010.png at 01D1E323.CF61AAE0] [cid:image011.png at 01D1E323.CF61AAE0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" > To: gpfsug main discussion list > Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So in this scenario Ken, can server3 see any disks in site1? From: > on behalf of Ken Hill > Reply-To: gpfsug main discussion list > Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kcfor more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image012.png at 01D1E323.CF61AAE0] [cid:image013.png at 01D1E323.CF61AAE0] [cid:image014.png at 01D1E323.CF61AAE0] [cid:image015.png at 01D1E323.CF61AAE0] [cid:image016.png at 01D1E323.CF61AAE0] [cid:image017.png at 01D1E323.CF61AAE0] [cid:image018.png at 01D1E323.CF61AAE0] [cid:image019.png at 01D1E323.CF61AAE0] [cid:image020.png at 01D1E323.CF61AAE0] [cid:image021.png at 01D1E323.CF61AAE0] [cid:image022.png at 01D1E323.CF61AAE0] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" > To: gpfsug main discussion list > Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1621 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1597 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1072 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 979 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1564 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1313 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 1168 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 1426 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1369 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 1244 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 4454 bytes Desc: image011.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image012.png Type: image/png Size: 1622 bytes Desc: image012.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image013.png Type: image/png Size: 1598 bytes Desc: image013.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image014.png Type: image/png Size: 1073 bytes Desc: image014.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image015.png Type: image/png Size: 980 bytes Desc: image015.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image016.png Type: image/png Size: 1565 bytes Desc: image016.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image017.png Type: image/png Size: 1314 bytes Desc: image017.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image018.png Type: image/png Size: 1169 bytes Desc: image018.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image019.png Type: image/png Size: 1427 bytes Desc: image019.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image020.png Type: image/png Size: 1370 bytes Desc: image020.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image021.png Type: image/png Size: 1245 bytes Desc: image021.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image022.png Type: image/png Size: 4455 bytes Desc: image022.png URL: From viccornell at gmail.com Thu Jul 21 14:02:02 2016 From: viccornell at gmail.com (Vic Cornell) Date: Thu, 21 Jul 2016 14:02:02 +0100 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: The annoying answer is "it depends?. I ran a system with all of the NSDs being visible to all of the NSDs on both sites and that worked well. However there are lots of questions to answer: Where are the clients going to live? Will you have clients in both sites or just one? Is it dual site working or just DR? Where will the majority of the writes happen? Would you rather that traffic went over the SAN or the IP link? Do you have a SAN link between the 2 sites? Which is faster, the SAN link between sites or the IP link between the sites? Are they the same link? Are they both redundant, which is the most stable? The answers to these questions would drive the design of the gpfs filesystem. For example if there are clients on only on site A , you might then make the NSD servers on site A the primary NSD servers for all of the NSDs on site A and site B - then you send the replica blocks over the SAN. You also could make a matrix of the failure scenarios you want to protect against, the consequences of the failure and the likelihood of failure etc. That will also inform the design. Does that help? Vic > On 21 Jul 2016, at 1:45 pm, Mark.Bush at siriuscom.com wrote: > > This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. > > Site1 > NSD Node1 > ---NSD1 ---Physical LUN1 from SAN1 > NSD Node2 > > > Site2 > NSD Node3 > ---NSD2 ?Physical LUN2 from SAN2 > NSD Node4 > > > Or > > > Site1 > NSD Node1 > ----NSD1 ?Physical LUN1 from SAN1 > ----NSD2 ?Physical LUN2 from SAN2 > NSD Node2 > > Site 2 > NSD Node3 > ---NSD2 ? Physical LUN2 from SAN2 > ---NSD1 --Physical LUN1 from SAN1 > NSD Node4 > > > Site 3 > Node5 Quorum > > > > From: > on behalf of Ken Hill > > Reply-To: gpfsug main discussion list > > Date: Wednesday, July 20, 2016 at 7:02 PM > To: gpfsug main discussion list > > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario > > Yes - it is a cluster. > > The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). > > Regards, > > Ken Hill > Technical Sales Specialist | Software Defined Solution Sales > IBM Systems > Phone:1-540-207-7270 > E-mail: kenh at us.ibm.com > > > 2300 Dulles Station Blvd > Herndon, VA 20171-6133 > United States > > > > > > > From: "Mark.Bush at siriuscom.com " > > To: gpfsug main discussion list > > Date: 07/20/2016 07:33 PM > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > So in this scenario Ken, can server3 see any disks in site1? > > From: > on behalf of Ken Hill > > Reply-To: gpfsug main discussion list > > Date: Wednesday, July 20, 2016 at 4:15 PM > To: gpfsug main discussion list > > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario > > > Site1 Site2 > Server1 (quorum 1) Server3 (quorum 2) > Server2 Server4 > > > > > SiteX > Server5 (quorum 3) > > > > > You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. > > You can further isolate failure by increasing quorum (odd numbers). > > The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. > > - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. > - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. > - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. > - etc > > Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. > > Ken Hill > Technical Sales Specialist | Software Defined Solution Sales > IBM Systems > Phone:1-540-207-7270 > E-mail: kenh at us.ibm.com > > > 2300 Dulles Station Blvd > Herndon, VA 20171-6133 > United States > > > > > > > From: "Mark.Bush at siriuscom.com " > > To: gpfsug main discussion list > > Date: 07/20/2016 04:47 PM > Subject: [gpfsug-discuss] NDS in Two Site scenario > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > > For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. > > > > Mark R. Bush| Solutions Architect > Mobile: 210.237.8415 | mark.bush at siriuscom.com > Sirius Computer Solutions | www.siriuscom.com > 10100 Reunion Place, Suite 500, San Antonio, TX 78216 > > This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. > > Sirius Computer Solutions _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 21 14:12:58 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 21 Jul 2016 13:12:58 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: <10D22907-E641-41AF-A31A-17755288E005@siriuscom.com> Thanks Vic&Simon, I?m totally cool with ?it depends? the solution guidance is to achieve a Highly Available FS. And there is Dark Fibre between the two locations. FileNet is the application and they want two things. Ability to write in both locations (maybe close to at the same time not necessarily the same files though) and protect against any site failure. So in my mind my Scenario 1 would work as long as I had copies=2 and restripe are acceptable. Is my Scenario 2 I would still have to restripe if the SAN in site 1 went down. I?m looking for the simplest approach that provides the greatest availability. From: on behalf of "Simon Thompson (Research Computing - IT Services)" Reply-To: gpfsug main discussion list Date: Thursday, July 21, 2016 at 8:02 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario It depends. What are you protecting against? Either will work depending on your acceptable failure modes. I'm assuming here that you are using copies=2 to replicate the data, and that the NSD devices have different failure groups per site. In the second example, if you were to lose the NSD servers in Site 1, but not the SAN, you would continue to have 2 copies of data written as the NSD servers in Site 2 could write to the SAN in Site 1. In the first example you would need to rest ripe the file-system when brining the Site 1 back online to ensure data is replicated.\ Simon From: > on behalf of "Mark.Bush at siriuscom.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 21 July 2016 at 13:45 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. Site1 NSD Node1 ---NSD1 ---Physical LUN1 from SAN1 NSD Node2 Site2 NSD Node3 ---NSD2 ?Physical LUN2 from SAN2 NSD Node4 Or Site1 NSD Node1 ----NSD1 ?Physical LUN1 from SAN1 ----NSD2 ?Physical LUN2 from SAN2 NSD Node2 Site 2 NSD Node3 ---NSD2 ? Physical LUN2 from SAN2 ---NSD1 --Physical LUN1 from SAN1 NSD Node4 Site 3 Node5 Quorum From: > on behalf of Ken Hill > Reply-To: gpfsug main discussion list > Date: Wednesday, July 20, 2016 at 7:02 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image001.png at 01D1E327.B037C650] [cid:image002.png at 01D1E327.B037C650] [cid:image003.png at 01D1E327.B037C650] [cid:image004.png at 01D1E327.B037C650] [cid:image005.png at 01D1E327.B037C650] [cid:image006.png at 01D1E327.B037C650] [cid:image007.png at 01D1E327.B037C650] [cid:image008.png at 01D1E327.B037C650] [cid:image009.png at 01D1E327.B037C650] [cid:image010.png at 01D1E327.B037C650] [cid:image011.png at 01D1E327.B037C650] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" > To: gpfsug main discussion list > Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ So in this scenario Ken, can server3 see any disks in site1? From: > on behalf of Ken Hill > Reply-To: gpfsug main discussion list > Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kcfor more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems ________________________________ Phone:1-540-207-7270 E-mail: kenh at us.ibm.com [cid:image012.png at 01D1E327.B037C650] [cid:image013.png at 01D1E327.B037C650] [cid:image014.png at 01D1E327.B037C650] [cid:image015.png at 01D1E327.B037C650] [cid:image016.png at 01D1E327.B037C650] [cid:image017.png at 01D1E327.B037C650] [cid:image018.png at 01D1E327.B037C650] [cid:image019.png at 01D1E327.B037C650] [cid:image020.png at 01D1E327.B037C650] [cid:image021.png at 01D1E327.B037C650] [cid:image022.png at 01D1E327.B037C650] 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" > To: gpfsug main discussion list > Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1622 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 1598 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 1073 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 980 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1565 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1314 bytes Desc: image006.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image007.png Type: image/png Size: 1169 bytes Desc: image007.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image008.png Type: image/png Size: 1427 bytes Desc: image008.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image009.png Type: image/png Size: 1370 bytes Desc: image009.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image010.png Type: image/png Size: 1245 bytes Desc: image010.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image011.png Type: image/png Size: 4455 bytes Desc: image011.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image012.png Type: image/png Size: 1623 bytes Desc: image012.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image013.png Type: image/png Size: 1599 bytes Desc: image013.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image014.png Type: image/png Size: 1074 bytes Desc: image014.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image015.png Type: image/png Size: 981 bytes Desc: image015.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image016.png Type: image/png Size: 1566 bytes Desc: image016.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image017.png Type: image/png Size: 1315 bytes Desc: image017.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image018.png Type: image/png Size: 1170 bytes Desc: image018.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image019.png Type: image/png Size: 1428 bytes Desc: image019.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image020.png Type: image/png Size: 1371 bytes Desc: image020.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image021.png Type: image/png Size: 1246 bytes Desc: image021.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image022.png Type: image/png Size: 4456 bytes Desc: image022.png URL: From makaplan at us.ibm.com Thu Jul 21 14:33:47 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 21 Jul 2016 09:33:47 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: I don't know. That said, let's be logical and cautious. Your network performance has got to be comparable to (preferably better than!) your disk/storage system. Think speed, latency, bandwidth, jitter, reliability, security. For a production system with data you care about, that probably means a dedicated/private/reserved channel, probably on private or leased fiber. Sure you can cobble together a demo, proof-of-concept, or prototype with less than that, but are you going to bet your career, life, friendships, data on that? Then you have to work through and test failure and recover scenarios... This forum would be one place to gather at least some anecdotes from power users/admins who might be running GPFS clusters spread over multiple kilometers... Is there a sale or marketing team selling this? What do they recommend? Here is an excerpt from an IBM white paper I found by googling... Notice the qualifier "high quality wide area network": "...Synchronous replication works well for many workloads by replicating data across storage arrays within a data center, within a campus or across geographical distances using high quality wide area network connections. When wide area network connections are not high performance or are not reliable, an asynchronous approach to data replication is required. GPFS 3.5 introduces a feature called Active File Management (AFM). ..." Of course GPFS has improved (and been renamed!) since 3.5 but 4.2 cannot magically compensate for a not-so-high-quality network! From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:34 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org Marc, what you are saying is anything outside a particular data center shouldn?t be part of a cluster? I?m not sure marketing is in line with this then. From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:52 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Careful! You need to plan and test, test and plan both failure scenarios and performance under high network loads. I don't believe GPFS was designed with the idea of splitting clusters over multiple sites. If your inter-site network runs fast enough, and you can administer it well enough -- perhaps it will work well enough... Hint: Think about the what the words "cluster" and "site" mean. GPFS does have the AFM feature, which was designed for multi-site deployments. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jul 21 15:01:01 2016 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 21 Jul 2016 14:01:01 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: So just to be clear, my DCs are about 1.5kM as the fibre goes. We have dedicated extended SAN fibre and also private multi-10GbE links between the sites with Ethernet fabric switches. Simon From: > on behalf of Marc A Kaplan > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 21 July 2016 at 14:33 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] NDS in Two Site scenario This forum would be one place to gather at least some anecdotes from power users/admins who might be running GPFS clusters spread over multiple kilometers... -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Bush at siriuscom.com Thu Jul 21 15:01:49 2016 From: Mark.Bush at siriuscom.com (Mark.Bush at siriuscom.com) Date: Thu, 21 Jul 2016 14:01:49 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> Message-ID: Well said Marc. I think in IBM?s marketing pitches they make it sound so simple and easy. But this doesn?t take the place of well planned, tested, and properly sized implementations. From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Thursday, July 21, 2016 at 8:33 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario I don't know. That said, let's be logical and cautious. Your network performance has got to be comparable to (preferably better than!) your disk/storage system. Think speed, latency, bandwidth, jitter, reliability, security. For a production system with data you care about, that probably means a dedicated/private/reserved channel, probably on private or leased fiber. Sure you can cobble together a demo, proof-of-concept, or prototype with less than that, but are you going to bet your career, life, friendships, data on that? Then you have to work through and test failure and recover scenarios... This forum would be one place to gather at least some anecdotes from power users/admins who might be running GPFS clusters spread over multiple kilometers... Is there a sale or marketing team selling this? What do they recommend? Here is an excerpt from an IBM white paper I found by googling... Notice the qualifier "high quality wide area network": "...Synchronous replication works well for many workloads by replicating data across storage arrays within a data center, within a campus or across geographical distances using high quality wide area network connections. When wide area network connections are not high performance or are not reliable, an asynchronous approach to data replication is required. GPFS 3.5 introduces a feature called Active File Management (AFM). ..." Of course GPFS has improved (and been renamed!) since 3.5 but 4.2 cannot magically compensate for a not-so-high-quality network! From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:34 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Marc, what you are saying is anything outside a particular data center shouldn?t be part of a cluster? I?m not sure marketing is in line with this then. From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:52 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Careful! You need to plan and test, test and plan both failure scenarios and performance under high network loads. I don't believe GPFS was designed with the idea of splitting clusters over multiple sites. If your inter-site network runs fast enough, and you can administer it well enough -- perhaps it will work well enough... Hint: Think about the what the words "cluster" and "site" mean. GPFS does have the AFM feature, which was designed for multi-site deployments. This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From eboyd at us.ibm.com Thu Jul 21 15:39:18 2016 From: eboyd at us.ibm.com (Edward Boyd) Date: Thu, 21 Jul 2016 14:39:18 +0000 Subject: [gpfsug-discuss] NDS in Two Site scenario @ Mark Bush In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From sjhoward at iu.edu Thu Jul 21 16:21:04 2016 From: sjhoward at iu.edu (Howard, Stewart Jameson) Date: Thu, 21 Jul 2016 15:21:04 +0000 Subject: [gpfsug-discuss] Performance Issues with SMB/NFS to GPFS Backend Message-ID: <8307febb40d24a58b036002224f27d4d@in-cci-exch09.ads.iu.edu> Hi All, I have a two-site replicate GPFS cluster running GPFS v3.5.0-26. We have recently run into a performane problem while exporting an SMB mount to one of our client labs. Specifically, this lab is attempting to run a MatLab SPM job in the SMB share and seeing sharply degraded performance versus running it over NFS to their own NFS service. The job does time-slice correction on MRI image volumes that result in roughly 15,000 file creates, plus at lease one read and at least one write to each file. Here is a list that briefly describes the time-to-completion for this job, as run under various conditions: 1) Backed by their local fileserver, running over NFS - 5 min 2) Backed by our GPFS, running over SMB - 30 min 3) Backed by our GPFS, running over NFS - 20 min 4) Backed by local disk on our exporting protocol node, over SMB - 6 min 5) Backed by local disk on our exporting protocol node, over NFS - 6 min 6) Back by GPFS, running over GPFS native client on our supercomputer - 2 min >From this list, it seems that the performance problems arise when combining either SMB or NFS with the GPFS backend. It is our conclusion that neither SMB nor NFS per se create the problem, exporting a local disk share over either of these protocols yields decent performance. Do you have any insight as to why the combination of the GPFS back-end with either NFS or SMB yields such anemic performance? Can you offer any tuning recommendations that may improve the performance when running over SMB to the GPFS back-end (our preferred method of deployment)? Thank you so much for your help as always! Stewart Howard Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 21 16:44:17 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 21 Jul 2016 11:44:17 -0400 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <8307febb40d24a58b036002224f27d4d@in-cci-exch09.ads.iu.edu> References: <8307febb40d24a58b036002224f27d4d@in-cci-exch09.ads.iu.edu> Message-ID: [Apologies] It has been pointed out to me that anyone seriously interested in clusters split over multiple sites should ReadTheFineManuals and in particular chapter 6 of the GPFS or Spectrum Scale Advanced Admin Guide. I apologize for anything I said that may have contradicted TFMs. Still it seems any which way you look at it - State of the art, today, this is not an easy plug and play, tab A into slot A, tab B into slot B and we're done - kinda-thing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Jul 21 17:04:48 2016 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Thu, 21 Jul 2016 16:04:48 +0000 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support Message-ID: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in particular the putacl and getacl functions) have no support for not following symlinks. Is there some hidden support for gpfs_putacl that will cause it to not deteference symbolic links? Something like the O_NOFOLLOW flag used elsewhere in linux? Thanks! -Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Thu Jul 21 18:15:18 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Thu, 21 Jul 2016 17:15:18 +0000 Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Message-ID: Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just keep on coming.... What's the story with Peer Snapshot functionality over GPFS protocol? Will it ever work, or should I crank my nice setup down a notch to NFS? Are peer snapshots supposed to work over NFS for Independent Writer mode AFM caches, or just Single Writers? Like Barry says, new AFM toys are great. Can we have more, please? Cheers, Luke, Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From shankbal at in.ibm.com Fri Jul 22 01:51:53 2016 From: shankbal at in.ibm.com (Shankar Balasubramanian) Date: Fri, 22 Jul 2016 06:21:53 +0530 Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol In-Reply-To: References: Message-ID: Peer snapshots will not work for independent writers as they are purely meant for Single Write. It will work both for GPFS and NFS protocols. For AFM to work well with GPFS protocol, you should have high bandwidth, reliable network. Best Regards, Shankar Balasubramanian STSM, AFM & Async DR Development IBM Systems Bangalore - Embassy Golf Links India From: Luke Raimbach To: gpfsug main discussion list Date: 07/21/2016 10:45 PM Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just keep on coming.... What's the story with Peer Snapshot functionality over GPFS protocol? Will it ever work, or should I crank my nice setup down a notch to NFS? Are peer snapshots supposed to work over NFS for Independent Writer mode AFM caches, or just Single Writers? Like Barry says, new AFM toys are great. Can we have more, please? Cheers, Luke, Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Fri Jul 22 09:36:40 2016 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 22 Jul 2016 08:36:40 +0000 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience In-Reply-To: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> References: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> Message-ID: Hi Ash Our ifcfg files for the bonded interfaces (this applies to GPFS, data and mgmt networks) are set to mode1: BONDING_OPTS="mode=1 miimon=200" If we have ever had a network outage on the ports for these interfaces, apart from pulling a cable for testing when they went in, then I guess we have it setup right as we've never noticed an issue. The specific mode1 was asked for by our networks team. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ashish Thandavan Sent: 21 July 2016 11:26 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience Dear all, Please could anyone be able to point me at specifications required for the GPFS heartbeat network? Are there any figures for latency, jitter, etc that one should be aware of? I also have a related question about resilience. Our three GPFS NSD servers utilize a single network port on each server and communicate heartbeat traffic over a private VLAN. We are looking at improving the resilience of this setup by adding an additional network link on each server (going to a different member of a pair of stacked switches than the existing one) and running the heartbeat network over bonded interfaces on the three servers. Are there any recommendations as to which network bonding type to use? Based on the name alone, Mode 1 (active-backup) appears to be the ideal choice, and I believe the switches do not need any special configuration. However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might be the way to go; this aggregates the two ports and does require the relevant switch ports to be configured to support this. Is there a recommended bonding mode? If anyone here currently uses bonded interfaces for their GPFS heartbeat traffic, may I ask what type of bond have you configured? Have you had any problems with the setup? And more importantly, has it been of use in keeping the cluster up and running in the scenario of one network link going down? Thank you, Regards, Ash -- ------------------------- Ashish Thandavan UNIX Support Computing Officer Department of Computer Science University of Oxford Wolfson Building Parks Road Oxford OX1 3QD Phone: 01865 610733 Email: ashish.thandavan at cs.ox.ac.uk _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ashish.thandavan at cs.ox.ac.uk Fri Jul 22 09:57:02 2016 From: ashish.thandavan at cs.ox.ac.uk (Ashish Thandavan) Date: Fri, 22 Jul 2016 09:57:02 +0100 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience In-Reply-To: References: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> Message-ID: <918adc41-85a0-6012-ddb8-7ef602773455@cs.ox.ac.uk> Hi Richard, Thank you, that is very good to know! Regards, Ash On 22/07/16 09:36, Sobey, Richard A wrote: > Hi Ash > > Our ifcfg files for the bonded interfaces (this applies to GPFS, data and mgmt networks) are set to mode1: > > BONDING_OPTS="mode=1 miimon=200" > > If we have ever had a network outage on the ports for these interfaces, apart from pulling a cable for testing when they went in, then I guess we have it setup right as we've never noticed an issue. The specific mode1 was asked for by our networks team. > > Richard > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ashish Thandavan > Sent: 21 July 2016 11:26 > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience > > Dear all, > > Please could anyone be able to point me at specifications required for the GPFS heartbeat network? Are there any figures for latency, jitter, etc that one should be aware of? > > I also have a related question about resilience. Our three GPFS NSD servers utilize a single network port on each server and communicate heartbeat traffic over a private VLAN. We are looking at improving the resilience of this setup by adding an additional network link on each server (going to a different member of a pair of stacked switches than the existing one) and running the heartbeat network over bonded interfaces on the three servers. Are there any recommendations as to which network bonding type to use? > > Based on the name alone, Mode 1 (active-backup) appears to be the ideal choice, and I believe the switches do not need any special configuration. However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might be the way to go; this aggregates the two ports and does require the relevant switch ports to be configured to support this. > Is there a recommended bonding mode? > > If anyone here currently uses bonded interfaces for their GPFS heartbeat traffic, may I ask what type of bond have you configured? Have you had any problems with the setup? And more importantly, has it been of use in keeping the cluster up and running in the scenario of one network link going down? > > Thank you, > > Regards, > Ash > > > > -- > ------------------------- > Ashish Thandavan > > UNIX Support Computing Officer > Department of Computer Science > University of Oxford > Wolfson Building > Parks Road > Oxford OX1 3QD > > Phone: 01865 610733 > Email: ashish.thandavan at cs.ox.ac.uk > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- ------------------------- Ashish Thandavan UNIX Support Computing Officer Department of Computer Science University of Oxford Wolfson Building Parks Road Oxford OX1 3QD Phone: 01865 610733 Email: ashish.thandavan at cs.ox.ac.uk From mimarsh2 at vt.edu Fri Jul 22 15:39:55 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Fri, 22 Jul 2016 10:39:55 -0400 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience In-Reply-To: <918adc41-85a0-6012-ddb8-7ef602773455@cs.ox.ac.uk> References: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> <918adc41-85a0-6012-ddb8-7ef602773455@cs.ox.ac.uk> Message-ID: Sort of trailing on this thread - Is a bonded active-active 10gig ethernet network enough bandwidth to run data and heartbeat/admin on the same network? I assume it comes down to a question of latency and congestion but would like to hear others' stories. Is anyone doing anything fancy with QOS to make sure admin/heartbeat traffic is not delayed? All of our current clusters use Infiniband for data and mgt traffic, but we are building a cluster that has dual 10gigE to each compute node. The NSD servers have 40gigE connections to the core network where 10gigE switches uplink. On Fri, Jul 22, 2016 at 4:57 AM, Ashish Thandavan < ashish.thandavan at cs.ox.ac.uk> wrote: > Hi Richard, > > Thank you, that is very good to know! > > Regards, > Ash > > > On 22/07/16 09:36, Sobey, Richard A wrote: > >> Hi Ash >> >> Our ifcfg files for the bonded interfaces (this applies to GPFS, data and >> mgmt networks) are set to mode1: >> >> BONDING_OPTS="mode=1 miimon=200" >> >> If we have ever had a network outage on the ports for these interfaces, >> apart from pulling a cable for testing when they went in, then I guess we >> have it setup right as we've never noticed an issue. The specific mode1 was >> asked for by our networks team. >> >> Richard >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org [mailto: >> gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ashish Thandavan >> Sent: 21 July 2016 11:26 >> To: gpfsug-discuss at spectrumscale.org >> Subject: [gpfsug-discuss] GPFS heartbeat network specifications and >> resilience >> >> Dear all, >> >> Please could anyone be able to point me at specifications required for >> the GPFS heartbeat network? Are there any figures for latency, jitter, etc >> that one should be aware of? >> >> I also have a related question about resilience. Our three GPFS NSD >> servers utilize a single network port on each server and communicate >> heartbeat traffic over a private VLAN. We are looking at improving the >> resilience of this setup by adding an additional network link on each >> server (going to a different member of a pair of stacked switches than the >> existing one) and running the heartbeat network over bonded interfaces on >> the three servers. Are there any recommendations as to which network >> bonding type to use? >> >> Based on the name alone, Mode 1 (active-backup) appears to be the ideal >> choice, and I believe the switches do not need any special configuration. >> However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might >> be the way to go; this aggregates the two ports and does require the >> relevant switch ports to be configured to support this. >> Is there a recommended bonding mode? >> >> If anyone here currently uses bonded interfaces for their GPFS heartbeat >> traffic, may I ask what type of bond have you configured? Have you had any >> problems with the setup? And more importantly, has it been of use in >> keeping the cluster up and running in the scenario of one network link >> going down? >> >> Thank you, >> >> Regards, >> Ash >> >> >> >> -- >> ------------------------- >> Ashish Thandavan >> >> UNIX Support Computing Officer >> Department of Computer Science >> University of Oxford >> Wolfson Building >> Parks Road >> Oxford OX1 3QD >> >> Phone: 01865 610733 >> Email: ashish.thandavan at cs.ox.ac.uk >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > ------------------------- > Ashish Thandavan > > UNIX Support Computing Officer > Department of Computer Science > University of Oxford > Wolfson Building > Parks Road > Oxford OX1 3QD > > Phone: 01865 610733 > Email: ashish.thandavan at cs.ox.ac.uk > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chekh at stanford.edu Fri Jul 22 17:25:49 2016 From: chekh at stanford.edu (Alex Chekholko) Date: Fri, 22 Jul 2016 09:25:49 -0700 Subject: [gpfsug-discuss] GPFS heartbeat network specifications and resilience In-Reply-To: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> References: <87f9fc37-b12a-2ef9-14b7-271480092dae@cs.ox.ac.uk> Message-ID: <81342a19-1cec-14de-9d7f-176ff7511511@stanford.edu> Hi Ashish, Can you describe more about what problem you are trying to solve? And what failure mode you are trying to avoid? GPFS depends on uninterrupted network access between the cluster members (well, mainly between each cluster member and the current cluster manager node), but there are many ways to ensure that, and many ways to recover from interruptions. e.g. we tend to set minMissedPingTimeout 30 pingPeriod 5 Bump those up if network/system gets busy. Performance and latency will suffer but at least cluster members won't be expelled. Regards, Alex On 07/21/2016 03:26 AM, Ashish Thandavan wrote: > Dear all, > > Please could anyone be able to point me at specifications required for > the GPFS heartbeat network? Are there any figures for latency, jitter, > etc that one should be aware of? > > I also have a related question about resilience. Our three GPFS NSD > servers utilize a single network port on each server and communicate > heartbeat traffic over a private VLAN. We are looking at improving the > resilience of this setup by adding an additional network link on each > server (going to a different member of a pair of stacked switches than > the existing one) and running the heartbeat network over bonded > interfaces on the three servers. Are there any recommendations as to > which network bonding type to use? > > Based on the name alone, Mode 1 (active-backup) appears to be the ideal > choice, and I believe the switches do not need any special > configuration. However, it has been suggested that Mode 4 (802.3ad) or > LACP bonding might be the way to go; this aggregates the two ports and > does require the relevant switch ports to be configured to support this. > Is there a recommended bonding mode? > > If anyone here currently uses bonded interfaces for their GPFS heartbeat > traffic, may I ask what type of bond have you configured? Have you had > any problems with the setup? And more importantly, has it been of use in > keeping the cluster up and running in the scenario of one network link > going down? > > Thank you, > > Regards, > Ash > > > -- Alex Chekholko chekh at stanford.edu From volobuev at us.ibm.com Fri Jul 22 18:56:31 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Fri, 22 Jul 2016 10:56:31 -0700 Subject: [gpfsug-discuss] NDS in Two Site scenario In-Reply-To: <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> References: <39AFAC3E-66B8-4572-BE2E-7DC9F8B034D8@siriuscom.com> <7547EE20-C302-4892-85EA-CD4D045B0085@siriuscom.com> Message-ID: There are multiple ways to accomplish active-active two-side synchronous DR, aka "stretch cluster". The most common approach is to have 3 sites: two main sites A and B, plus tiebreaker site C. The two main sites host all data/metadata disks and each has some even number of quorum nodes. There's no stretched SAN, each site has its own set of NSDs defined. The tiebreaker site consists of a single quorum node with a small descOnly LUN. In this config, any of the 3 sites can do down or be disconnected from the rest without affecting the other two. The tiebreaker site is essential: it provides a quorum node for node majority quorum to function, and a descOnly disk for the file system descriptor quorum. Technically speaking, one do away with the need to have a quorum node at site C by using "minority quorum", i.e. tiebreaker disks, but this model is more complex and it is harder to predict its behavior under various failure conditions. The basic problem with the minority quorum is that it allows a minority of nodes to win in a network partition scenario, just like the name implies. In the extreme case this leads to the "dictator problem", when a single partitioned node could manage to win the disk election and thus kick everyone else out. And since a tiebreaker disk needs to be visible from all quorum nodes, you do need a stretched SAN that extends between sites. The classic active-active stretch cluster only requires a good TCP/IP network. The question that gets asked a lot is "how good should be network connection between sites be". There's no simple answer, unfortunately. It would be completely impractical to try to frame this in simple thresholds. The worse the network connection is, the more pain it produces, but everyone has a different level of pain tolerance. And everyone's workload is different. In any GPFS configuration that uses data replication, writes are impacted far more by replication than reads. So a read-mostly workload may run fine with a dodgy inter-site link, while a write-heavy workload may just run into the ground, as IOs may be submitted faster than they could be completed. The buffering model could make a big difference. An application that does a fair amount of write bursts, with those writes being buffered in a generously sized pagepool, may perform acceptably, while a different application that uses O_SYNC or O_DIRECT semantics for writes may run a lot worse, all other things being equal. As long as all nodes can renew their disk leases within the configured disk lease interval (35 sec by default), GPFS will basically work, so the absolute threshold for the network link quality is not particularly stringent, but beyond that it all depends on your workload and your level of pain tolerance. Practically speaking, you want a network link with low-double-digits RTT at worst, almost no packet loss, and bandwidth commensurate with your application IO needs (fudged some to allow for write amplification -- another factor that's entirely workload-dependent). So a link with, say, 100ms RTT and 2% packet loss is not going to be usable to almost anyone, in my opinion, a link with 30ms RTT and 0.1% packet loss may work for some undemanding read-mostly workloads, and so on. So you pretty much have to try it out to see. The disk configuration is another tricky angle. The simplest approach is to have two groups of data/metadata NSDs, on sites A and B, and not have any sort of SAN reaching across sites. Historically, such a config was actually preferred over a stretched SAN, because it allowed for a basic site topology definition. When multiple replicas of the same logical block are present, it is obviously better/faster to read the replica that resides on a disk that's local to a given site. This is conceptually simple, but how would GPFS know what a site is and what disks are local vs remote? To GPFS, all disks are equal. Historically, the readReplicaPolicy=local config parameter was put forward to work around the problem. The basic idea was: if the reader node is on the same subnet as the primary NSD server for a given replica, this replica is "local", and is thus preferred. This sort of works, but requires a very specific network configuration, which isn't always practical. Starting with GPFS 4.1.1, GPFS implements readReplicaPolicy=fastest, where the best replica for reads is picked based on observed disk IO latency. This is more general and works for all disk topologies, including a stretched SAN. yuri From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list , Date: 07/21/2016 05:45 AM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me. Site1 NSD Node1 ---NSD1 ---Physical LUN1 from SAN1 NSD Node2 Site2 NSD Node3 ---NSD2 ?Physical LUN2 from SAN2 NSD Node4 Or Site1 NSD Node1 ----NSD1 ?Physical LUN1 from SAN1 ----NSD2 ?Physical LUN2 from SAN2 NSD Node2 Site 2 NSD Node3 ---NSD2 ? Physical LUN2 from SAN2 ---NSD1 --Physical LUN1 from SAN1 NSD Node4 Site 3 Node5 Quorum From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 7:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Yes - it is a cluster. The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc). Regards, Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 07:33 PM Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org So in this scenario Ken, can server3 see any disks in site1? From: on behalf of Ken Hill Reply-To: gpfsug main discussion list Date: Wednesday, July 20, 2016 at 4:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NDS in Two Site scenario Site1 Site2 Server1 (quorum 1) Server3 (quorum 2) Server2 Server4 SiteX Server5 (quorum 3) You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down. You can further isolate failure by increasing quorum (odd numbers). The way quorum works is: The majority of the quorum nodes need to be up to survive an outage. - With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations. - With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations. - With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations. - etc Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kc for more information about quorum and tiebreaker disks. Ken Hill Technical Sales Specialist | Software Defined Solution Sales IBM Systems Phone:1-540-207-7270 E-mail: kenh at us.ibm.com 2300 Dulles Station Blvd Herndon, VA 20171-6133 United States From: "Mark.Bush at siriuscom.com" To: gpfsug main discussion list Date: 07/20/2016 04:47 PM Subject: [gpfsug-discuss] NDS in Two Site scenario Sent by: gpfsug-discuss-bounces at spectrumscale.org For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically. Mark R. Bush| Solutions Architect Mobile: 210.237.8415 | mark.bush at siriuscom.com Sirius Computer Solutions | www.siriuscom.com 10100 Reunion Place, Suite 500, San Antonio, TX 78216 This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you. Sirius Computer Solutions_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11130580.gif Type: image/gif Size: 1621 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11971715.gif Type: image/gif Size: 1597 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11264118.gif Type: image/gif Size: 1072 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11128019.gif Type: image/gif Size: 979 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11470612.gif Type: image/gif Size: 1564 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11952793.gif Type: image/gif Size: 1313 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11488202.gif Type: image/gif Size: 1168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11248852.gif Type: image/gif Size: 1426 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11036495.gif Type: image/gif Size: 1369 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11647743.gif Type: image/gif Size: 1244 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11125683.gif Type: image/gif Size: 4454 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11353219.gif Type: image/gif Size: 1622 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11280235.gif Type: image/gif Size: 1598 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11669375.gif Type: image/gif Size: 1073 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11650693.gif Type: image/gif Size: 980 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11604766.gif Type: image/gif Size: 1565 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11840270.gif Type: image/gif Size: 1314 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11842186.gif Type: image/gif Size: 1169 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11809831.gif Type: image/gif Size: 1427 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11549547.gif Type: image/gif Size: 1370 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11344792.gif Type: image/gif Size: 1245 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 11830257.gif Type: image/gif Size: 4455 bytes Desc: not available URL: From dhildeb at us.ibm.com Fri Jul 22 19:00:23 2016 From: dhildeb at us.ibm.com (Dean Hildebrand) Date: Fri, 22 Jul 2016 11:00:23 -0700 Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol In-Reply-To: References: Message-ID: Just to expand a bit on the use of peer snapshots. The point of psnap is to create a snapshot in the cache that is identical to a snapshot on the home. This way you can recover files from a snapshot of a fileset on the 'replica' of the data just like you can from a snapshot in the 'cache' (where the data was generated). With IW mode, its typically possible that the data could be changing on the home from another cache or clients directly running on the data on the home. In this case, it would be impossible to ensure that the snapshots in the cache and on the home are identical. Dean From: "Shankar Balasubramanian" To: gpfsug main discussion list Date: 07/21/2016 05:52 PM Subject: Re: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Peer snapshots will not work for independent writers as they are purely meant for Single Write. It will work both for GPFS and NFS protocols. For AFM to work well with GPFS protocol, you should have high bandwidth, reliable network. Best Regards, Shankar Balasubramanian STSM, AFM & Async DR Development IBM Systems Bangalore - Embassy Golf Links India Inactive hide details for Luke Raimbach ---07/21/2016 10:45:51 PM---Dear GPFS Experts, I've worn my brain out today with AFM anLuke Raimbach ---07/21/2016 10:45:51 PM---Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just k From: Luke Raimbach To: gpfsug main discussion list Date: 07/21/2016 10:45 PM Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just keep on coming.... What's the story with Peer Snapshot functionality over GPFS protocol? Will it ever work, or should I crank my nice setup down a notch to NFS? Are peer snapshots supposed to work over NFS for Independent Writer mode AFM caches, or just Single Writers? Like Barry says, new AFM toys are great. Can we have more, please? Cheers, Luke, Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From volobuev at us.ibm.com Fri Jul 22 20:24:58 2016 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Fri, 22 Jul 2016 12:24:58 -0700 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> Message-ID: In a word, no. I can't blame anyone for suspecting that there's yet another hidden flag somewhere, given our track record, but there's nothing hidden on this one, there's just no code to implement O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be a reasonable thing to have, so if you feel strongly enough about it to open an RFE, go for it. yuri From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" To: gpfsug main discussion list , Date: 07/21/2016 09:05 AM Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in particular the putacl and getacl functions) have no support for not following symlinks. Is there some hidden support for gpfs_putacl that will cause it to not deteference symbolic links? Something like the O_NOFOLLOW flag used elsewhere in linux? Thanks! -Aaron_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Fri Jul 22 23:36:46 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 22 Jul 2016 18:36:46 -0400 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> Message-ID: <9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> Thanks Yuri! I do wonder what security implications this might have for the policy engine where a nefarious user could trick it into performing an action on another file via symlink hijacking. Truthfully I've been more worried about an accidental hijack rather than someone being malicious. I'll open an RFE for it since I think it would be nice to have. (While I'm at it, I think I'll open another for having chown call exposed via the API). -Aaron On 7/22/16 3:24 PM, Yuri L Volobuev wrote: > In a word, no. I can't blame anyone for suspecting that there's yet > another hidden flag somewhere, given our track record, but there's > nothing hidden on this one, there's just no code to implement > O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be > a reasonable thing to have, so if you feel strongly enough about it to > open an RFE, go for it. > > yuri > > Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[COMPUTER > SCIENCE CORP]" ---07/21/2016 09:05:11 AM---Hi Everyone, I've"Knister, > Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" ---07/21/2016 09:05:11 > AM---Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) > and API calls (in particular the > > From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > > To: gpfsug main discussion list , > Date: 07/21/2016 09:05 AM > Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > Hi Everyone, > > I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in > particular the putacl and getacl functions) have no support for not > following symlinks. Is there some hidden support for gpfs_putacl that > will cause it to not deteference symbolic links? Something like the > O_NOFOLLOW flag used elsewhere in linux? > > Thanks! > > -Aaron_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: OpenPGP digital signature URL: From aaron.s.knister at nasa.gov Sat Jul 23 05:46:30 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sat, 23 Jul 2016 00:46:30 -0400 Subject: [gpfsug-discuss] inode update delay? Message-ID: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> I've noticed that there can be a several minute delay between the time changes to an inode occur and when those changes are reflected in the results of an inode scan. I've been working on code that checks ia_xperm to determine if a given file has extended acl entries and noticed in testing it that the acl flag wasn't getting set immediately after giving a file an acl. Here's what I mean: # cd /gpfsm/dnb32 # date; setfacl -b acltest* Sat Jul 23 00:24:57 EDT 2016 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:24:59 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:10 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:21 EDT 2016 0 I'm a little confused about what's going on here-- is there some kind of write-behind for inode updates? Is there a way I can cause the cluster to quiesce and flush all pending inode updates (an mmfsctl suspend and resume seem to have this effect but I was looking for something a little less user-visible)? If I access the directory containing the files from another node via the VFS mount then the update appears immediately in the inode scan. A mere inode scan from another node w/o touching the filesystem mount doesn't necessarily seem to trigger this behavior. Thanks! -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From shankbal at in.ibm.com Fri Jul 22 08:53:51 2016 From: shankbal at in.ibm.com (Shankar Balasubramanian) Date: Fri, 22 Jul 2016 13:23:51 +0530 Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol In-Reply-To: References: Message-ID: One correction to the note below, peer snapshots are not supported when AFM use GPFS protocol. Best Regards, Shankar Balasubramanian STSM, AFM & Async DR Development IBM Systems Bangalore - Embassy Golf Links India From: Shankar Balasubramanian/India/IBM at IBMIN To: gpfsug main discussion list Date: 07/22/2016 06:22 AM Subject: Re: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Peer snapshots will not work for independent writers as they are purely meant for Single Write. It will work both for GPFS and NFS protocols. For AFM to work well with GPFS protocol, you should have high bandwidth, reliable network. Best Regards, Shankar Balasubramanian STSM, AFM & Async DR Development IBM Systems Bangalore - Embassy Golf Links India Inactive hide details for Luke Raimbach ---07/21/2016 10:45:51 PM---Dear GPFS Experts, I've worn my brain out today with AFM anLuke Raimbach ---07/21/2016 10:45:51 PM---Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just k From: Luke Raimbach To: gpfsug main discussion list Date: 07/21/2016 10:45 PM Subject: [gpfsug-discuss] AFM Peer Snapshots over GPFS Protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear GPFS Experts, I've worn my brain out today with AFM and Windows ACLs, but more questions just keep on coming.... What's the story with Peer Snapshot functionality over GPFS protocol? Will it ever work, or should I crank my nice setup down a notch to NFS? Are peer snapshots supposed to work over NFS for Independent Writer mode AFM caches, or just Single Writers? Like Barry says, new AFM toys are great. Can we have more, please? Cheers, Luke, Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From MKEIGO at jp.ibm.com Sun Jul 24 03:31:05 2016 From: MKEIGO at jp.ibm.com (Keigo Matsubara) Date: Sun, 24 Jul 2016 11:31:05 +0900 Subject: [gpfsug-discuss] inode update delay? In-Reply-To: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> References: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> Message-ID: Hi Aaron, I think the product is designed so that some inode fields are not propagated among nodes instantly in order to avoid unnecessary overhead within the cluster. See: Exceptions to Open Group technical standards - IBM Spectrum Scale: Administration and Programming Reference - IBM Spectrum Scale 4.2 - IBM Knowledge Center https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_xopen.htm --- Keigo Matsubara, Industry Architect, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 From: Aaron Knister To: Date: 2016/07/23 13:47 Subject: [gpfsug-discuss] inode update delay? Sent by: gpfsug-discuss-bounces at spectrumscale.org I've noticed that there can be a several minute delay between the time changes to an inode occur and when those changes are reflected in the results of an inode scan. I've been working on code that checks ia_xperm to determine if a given file has extended acl entries and noticed in testing it that the acl flag wasn't getting set immediately after giving a file an acl. Here's what I mean: # cd /gpfsm/dnb32 # date; setfacl -b acltest* Sat Jul 23 00:24:57 EDT 2016 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:24:59 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:10 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:21 EDT 2016 0 I'm a little confused about what's going on here-- is there some kind of write-behind for inode updates? Is there a way I can cause the cluster to quiesce and flush all pending inode updates (an mmfsctl suspend and resume seem to have this effect but I was looking for something a little less user-visible)? If I access the directory containing the files from another node via the VFS mount then the update appears immediately in the inode scan. A mere inode scan from another node w/o touching the filesystem mount doesn't necessarily seem to trigger this behavior. Thanks! -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stef.coene at docum.org Sun Jul 24 11:27:28 2016 From: stef.coene at docum.org (Stef Coene) Date: Sun, 24 Jul 2016 12:27:28 +0200 Subject: [gpfsug-discuss] New to GPFS Message-ID: <57949810.2030002@docum.org> Hi, Like the subject says, I'm new to Spectrum Scale. We are considering GPFS as back end for CommVault back-up data. Back-end storage will be iSCSI (300 TB) and V5000 SAS (100 TB). I created a 2 node cluster (RHEL) with 2 protocol nodes and 1 client (Ubuntu) as test on ESXi 6. The RHEL servers are upgraded to 7.2. Will that be a problem or not? I saw some posts that there is an issue with RHEL 7.2.... Stef From makaplan at us.ibm.com Sun Jul 24 16:11:06 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 24 Jul 2016 11:11:06 -0400 Subject: [gpfsug-discuss] inode update delay? / mmapplypolicy In-Reply-To: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> References: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> Message-ID: mmapplypolicy uses the inodescan API which to gain overall speed, bypasses various buffers, caches, locks, ... and just reads inodes "directly" from disk. So the "view" of inodescan is somewhat "behind" the overall state of the live filesystem as viewed from the usual Posix APIs, such as stat(2). (Not to worry, all metadata updates are logged, so in event of a power loss or OS crash, GPFS recovers a consistent state from its log files...) This is at least mentioned in the docs. `mmfsctl suspend-write; mmfsctl resume;` is the only practical way I know to guarantee a forced a flush of all "dirty" buffers to disk -- any metadata updates before the suspend will for sure become visible to an inodescan after the resume. (Classic `sync` is not quite the same...) But think about this --- scanning a "live" file system is always somewhat iffy-dodgy and the result is smeared over the time of the scan -- if there are any concurrent changes during the scan your results are imprecise. An alternative is to use `mmcrsnapshot` and scan the snapshot. From: Aaron Knister To: Date: 07/23/2016 12:46 AM Subject: [gpfsug-discuss] inode update delay? Sent by: gpfsug-discuss-bounces at spectrumscale.org I've noticed that there can be a several minute delay between the time changes to an inode occur and when those changes are reflected in the results of an inode scan. I've been working on code that checks ia_xperm to determine if a given file has extended acl entries and noticed in testing it that the acl flag wasn't getting set immediately after giving a file an acl. Here's what I mean: # cd /gpfsm/dnb32 # date; setfacl -b acltest* Sat Jul 23 00:24:57 EDT 2016 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:24:59 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:10 EDT 2016 5 # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l Sat Jul 23 00:25:21 EDT 2016 0 I'm a little confused about what's going on here-- is there some kind of write-behind for inode updates? Is there a way I can cause the cluster to quiesce and flush all pending inode updates (an mmfsctl suspend and resume seem to have this effect but I was looking for something a little less user-visible)? If I access the directory containing the files from another node via the VFS mount then the update appears immediately in the inode scan. A mere inode scan from another node w/o touching the filesystem mount doesn't necessarily seem to trigger this behavior. Thanks! -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Sun Jul 24 16:54:16 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 24 Jul 2016 11:54:16 -0400 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: <9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> <9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> Message-ID: Regarding "policy engine"/inodescan and symbolic links. 1. Either the MODE and MISC_ATTRIBUTES properties (SQL variables) can be tested to see if an inode/file is a symlink or not. 2. Default behaviour for mmapplypolicy is to skip over symlinks. You must specify... DIRECTORIES_PLUS which ... Indicates that non-regular file objects (directories, symbolic links, and so on) should be included in the list. If not specified, only ordinary data files are included in the candidate lists. 3. You can apply Linux commands and APIs to GPFS pathnames. 4. Of course, if you need to get at a GPFS feature or attribute that is not supported by Linux ... 5. Hmmm... on my Linux system `setfacl -P ...` does not follow symlinks, but neither does it set the ACL for the symlink... Googling... some people consider this to be a bug, but maybe it is a feature... --marc From: Aaron Knister To: Date: 07/22/2016 06:37 PM Subject: Re: [gpfsug-discuss] GPFS API O_NOFOLLOW support Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Yuri! I do wonder what security implications this might have for the policy engine where a nefarious user could trick it into performing an action on another file via symlink hijacking. Truthfully I've been more worried about an accidental hijack rather than someone being malicious. I'll open an RFE for it since I think it would be nice to have. (While I'm at it, I think I'll open another for having chown call exposed via the API). -Aaron On 7/22/16 3:24 PM, Yuri L Volobuev wrote: > In a word, no. I can't blame anyone for suspecting that there's yet > another hidden flag somewhere, given our track record, but there's > nothing hidden on this one, there's just no code to implement > O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be > a reasonable thing to have, so if you feel strongly enough about it to > open an RFE, go for it. > > yuri > > Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[COMPUTER > SCIENCE CORP]" ---07/21/2016 09:05:11 AM---Hi Everyone, I've"Knister, > Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" ---07/21/2016 09:05:11 > AM---Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) > and API calls (in particular the > > From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" > > To: gpfsug main discussion list , > Date: 07/21/2016 09:05 AM > Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > Hi Everyone, > > I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in > particular the putacl and getacl functions) have no support for not > following symlinks. Is there some hidden support for gpfs_putacl that > will cause it to not deteference symbolic links? Something like the > O_NOFOLLOW flag used elsewhere in linux? > > Thanks! > > -Aaron_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 [attachment "signature.asc" deleted by Marc A Kaplan/Watson/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Mon Jul 25 00:15:02 2016 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Sun, 24 Jul 2016 23:15:02 +0000 Subject: [gpfsug-discuss] New to GPFS In-Reply-To: <57949810.2030002@docum.org> References: <57949810.2030002@docum.org> Message-ID: <43489850f79c446c9d9896292608a292@exch1-cdc.nexus.csiro.au> The issue is with the Protocols version of GPFS. I am using the non-protocols version 4.2.0.3 successfully on CentOS 7.2. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Stef Coene Sent: Sunday, 24 July 2016 8:27 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] New to GPFS Hi, Like the subject says, I'm new to Spectrum Scale. We are considering GPFS as back end for CommVault back-up data. Back-end storage will be iSCSI (300 TB) and V5000 SAS (100 TB). I created a 2 node cluster (RHEL) with 2 protocol nodes and 1 client (Ubuntu) as test on ESXi 6. The RHEL servers are upgraded to 7.2. Will that be a problem or not? I saw some posts that there is an issue with RHEL 7.2.... Stef _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mweil at wustl.edu Mon Jul 25 15:56:52 2016 From: mweil at wustl.edu (Matt Weil) Date: Mon, 25 Jul 2016 09:56:52 -0500 Subject: [gpfsug-discuss] New to GPFS In-Reply-To: <57949810.2030002@docum.org> References: <57949810.2030002@docum.org> Message-ID: On 7/24/16 5:27 AM, Stef Coene wrote: > Hi, > > Like the subject says, I'm new to Spectrum Scale. > > We are considering GPFS as back end for CommVault back-up data. > Back-end storage will be iSCSI (300 TB) and V5000 SAS (100 TB). > I created a 2 node cluster (RHEL) with 2 protocol nodes and 1 client > (Ubuntu) as test on ESXi 6. > > The RHEL servers are upgraded to 7.2. Will that be a problem or not? > I saw some posts that there is an issue with RHEL 7.2.... we had to upgrade to 4.2.0.3 when running RHEL 7.2 > > > Stef > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From aaron.s.knister at nasa.gov Mon Jul 25 20:50:54 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 25 Jul 2016 15:50:54 -0400 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov> <9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> Message-ID: <90c6b452-2df4-b217-eaaf-2f991bb2921e@nasa.gov> Thanks Marc. In my mind the issue is a timing one between the moment the policy engine decides to perform an action on a file (e.g. matching the path inode/gen number with that from the inode scan) and when it actually takes that action by calling an api call that takes a path as an argument. Your suggestion in #3 is the route I think I'm going to take here since I can call acl_get_fd after calling open/openat with O_NOFOLLOW. On 7/24/16 11:54 AM, Marc A Kaplan wrote: > Regarding "policy engine"/inodescan and symbolic links. > > 1. Either the MODE and MISC_ATTRIBUTES properties (SQL variables) can be > tested to see if an inode/file is a symlink or not. > > 2. Default behaviour for mmapplypolicy is to skip over symlinks. You > must specify... > > *DIRECTORIES_PLUS which ...* > > Indicates that non-regular file objects (directories, symbolic links, > and so on) should be included in > the list. If not specified, only ordinary data files are included in the > candidate lists. > > 3. You can apply Linux commands and APIs to GPFS pathnames. > > 4. Of course, if you need to get at a GPFS feature or attribute that is > not supported by Linux ... > > 5. Hmmm... on my Linux system `setfacl -P ...` does not follow symlinks, > but neither does it set the ACL for the symlink... > Googling... some people consider this to be a bug, but maybe it is a > feature... > > --marc > > > > From: Aaron Knister > To: > Date: 07/22/2016 06:37 PM > Subject: Re: [gpfsug-discuss] GPFS API O_NOFOLLOW support > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Thanks Yuri! I do wonder what security implications this might have for > the policy engine where a nefarious user could trick it into performing > an action on another file via symlink hijacking. Truthfully I've been > more worried about an accidental hijack rather than someone being > malicious. I'll open an RFE for it since I think it would be nice to > have. (While I'm at it, I think I'll open another for having chown call > exposed via the API). > > -Aaron > > On 7/22/16 3:24 PM, Yuri L Volobuev wrote: >> In a word, no. I can't blame anyone for suspecting that there's yet >> another hidden flag somewhere, given our track record, but there's >> nothing hidden on this one, there's just no code to implement >> O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be >> a reasonable thing to have, so if you feel strongly enough about it to >> open an RFE, go for it. >> >> yuri >> >> Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[COMPUTER >> SCIENCE CORP]" ---07/21/2016 09:05:11 AM---Hi Everyone, I've"Knister, >> Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" ---07/21/2016 09:05:11 >> AM---Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) >> and API calls (in particular the >> >> From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" >> >> To: gpfsug main discussion list , >> Date: 07/21/2016 09:05 AM >> Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> ------------------------------------------------------------------------ >> >> >> >> Hi Everyone, >> >> I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in >> particular the putacl and getacl functions) have no support for not >> following symlinks. Is there some hidden support for gpfs_putacl that >> will cause it to not deteference symbolic links? Something like the >> O_NOFOLLOW flag used elsewhere in linux? >> >> Thanks! >> >> -Aaron_______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > [attachment "signature.asc" deleted by Marc A Kaplan/Watson/IBM] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Mon Jul 25 20:57:25 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 25 Jul 2016 15:57:25 -0400 Subject: [gpfsug-discuss] inode update delay? / mmapplypolicy In-Reply-To: References: <1062b3f4-21bf-9002-f7a5-d69a26f466b4@nasa.gov> Message-ID: <291e1237-98d6-2abe-b1af-8898da61629f@nasa.gov> Thanks again, Marc. You're quite right about the results being smeared over time on a live filesystem even if the inodescan didn't lag behind slightly. The use case here is a mass uid number migration. File ownership is easy because I can be guaranteed after a certain point in time that no new files under the user's old uid number can be created. However, in part because of inheritance I'm not so lucky when it comes to ACLs. I almost need to do 2 passes when looking at the ACLs but even that's not guaranteed to catch everything. Using a snapshot is an interesting idea to give me a stable point in time snapshot to determine if I got everything. -Aaron On 7/24/16 11:11 AM, Marc A Kaplan wrote: > mmapplypolicy uses the inodescan API which to gain overall speed, > bypasses various buffers, caches, locks, ... and just reads inodes > "directly" from disk. > > So the "view" of inodescan is somewhat "behind" the overall state of the > live filesystem as viewed from the usual Posix APIs, such as stat(2). > (Not to worry, all metadata updates are logged, so in event of a power > loss or OS crash, GPFS recovers a consistent state from its log files...) > > This is at least mentioned in the docs. > > `mmfsctl suspend-write; mmfsctl resume;` is the only practical way I > know to guarantee a forced a flush of all "dirty" buffers to disk -- any > metadata updates before the suspend will for sure > become visible to an inodescan after the resume. (Classic `sync` is not > quite the same...) > > But think about this --- scanning a "live" file system is always > somewhat iffy-dodgy and the result is smeared over the time of the scan > -- if there are any concurrent changes > during the scan your results are imprecise. > > An alternative is to use `mmcrsnapshot` and scan the snapshot. > > > > > From: Aaron Knister > To: > Date: 07/23/2016 12:46 AM > Subject: [gpfsug-discuss] inode update delay? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > I've noticed that there can be a several minute delay between the time > changes to an inode occur and when those changes are reflected in the > results of an inode scan. I've been working on code that checks ia_xperm > to determine if a given file has extended acl entries and noticed in > testing it that the acl flag wasn't getting set immediately after giving > a file an acl. Here's what I mean: > > # cd /gpfsm/dnb32 > > # date; setfacl -b acltest* > Sat Jul 23 00:24:57 EDT 2016 > > # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l > Sat Jul 23 00:24:59 EDT 2016 > 5 > > # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l > Sat Jul 23 00:25:10 EDT 2016 > 5 > > # date; /usr/lpp/mmfs/samples/util/tsinode /gpfsm/dnb32 | egrep acl | wc -l > Sat Jul 23 00:25:21 EDT 2016 > 0 > > I'm a little confused about what's going on here-- is there some kind of > write-behind for inode updates? Is there a way I can cause the cluster > to quiesce and flush all pending inode updates (an mmfsctl suspend and > resume seem to have this effect but I was looking for something a little > less user-visible)? If I access the directory containing the files from > another node via the VFS mount then the update appears immediately in > the inode scan. A mere inode scan from another node w/o touching the > filesystem mount doesn't necessarily seem to trigger this behavior. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From makaplan at us.ibm.com Mon Jul 25 22:46:01 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 25 Jul 2016 17:46:01 -0400 Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support In-Reply-To: <90c6b452-2df4-b217-eaaf-2f991bb2921e@nasa.gov> References: <5F910253243E6A47B81A9A2EB424BBA101C70CBB@NDMSMBX404.ndc.nasa.gov><9e4f92dd-c3d7-a50e-69e0-6c34dfa1c46d@nasa.gov> <90c6b452-2df4-b217-eaaf-2f991bb2921e@nasa.gov> Message-ID: Unfortunately there is always a window of time between testing the file and acting on the file's pathname. At any moment after testing (finding) ... the file could change, or the same pathname could be pointing to a different inode/file. That is a potential problem with just about every Unix file utility and/or script you put together with the standard commands... find ... | xargs ... mmapplypolicy has the -e option to narrow the window by retesting just before executing an action. Of course it's seldom a real problem -- you have to think about scenarios where two minds are working within the same namespace of files and then they are doing so either carelessly without communicating or one is deliberately trying to cause trouble for the other! From: Aaron Knister To: Date: 07/25/2016 03:51 PM Subject: Re: [gpfsug-discuss] GPFS API O_NOFOLLOW support Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Marc. In my mind the issue is a timing one between the moment the policy engine decides to perform an action on a file (e.g. matching the path inode/gen number with that from the inode scan) and when it actually takes that action by calling an api call that takes a path as an argument. Your suggestion in #3 is the route I think I'm going to take here since I can call acl_get_fd after calling open/openat with O_NOFOLLOW. On 7/24/16 11:54 AM, Marc A Kaplan wrote: > Regarding "policy engine"/inodescan and symbolic links. > > 1. Either the MODE and MISC_ATTRIBUTES properties (SQL variables) can be > tested to see if an inode/file is a symlink or not. > > 2. Default behaviour for mmapplypolicy is to skip over symlinks. You > must specify... > > *DIRECTORIES_PLUS which ...* > > Indicates that non-regular file objects (directories, symbolic links, > and so on) should be included in > the list. If not specified, only ordinary data files are included in the > candidate lists. > > 3. You can apply Linux commands and APIs to GPFS pathnames. > > 4. Of course, if you need to get at a GPFS feature or attribute that is > not supported by Linux ... > > 5. Hmmm... on my Linux system `setfacl -P ...` does not follow symlinks, > but neither does it set the ACL for the symlink... > Googling... some people consider this to be a bug, but maybe it is a > feature... > > --marc > > > > From: Aaron Knister > To: > Date: 07/22/2016 06:37 PM > Subject: Re: [gpfsug-discuss] GPFS API O_NOFOLLOW support > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Thanks Yuri! I do wonder what security implications this might have for > the policy engine where a nefarious user could trick it into performing > an action on another file via symlink hijacking. Truthfully I've been > more worried about an accidental hijack rather than someone being > malicious. I'll open an RFE for it since I think it would be nice to > have. (While I'm at it, I think I'll open another for having chown call > exposed via the API). > > -Aaron > > On 7/22/16 3:24 PM, Yuri L Volobuev wrote: >> In a word, no. I can't blame anyone for suspecting that there's yet >> another hidden flag somewhere, given our track record, but there's >> nothing hidden on this one, there's just no code to implement >> O_NOFOLLOW. This isn't Posix, and we just never put it in. This would be >> a reasonable thing to have, so if you feel strongly enough about it to >> open an RFE, go for it. >> >> yuri >> >> Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[COMPUTER >> SCIENCE CORP]" ---07/21/2016 09:05:11 AM---Hi Everyone, I've"Knister, >> Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" ---07/21/2016 09:05:11 >> AM---Hi Everyone, I've noticed that many GPFS commands (mm*acl,mm*attr) >> and API calls (in particular the >> >> From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" >> >> To: gpfsug main discussion list , >> Date: 07/21/2016 09:05 AM >> Subject: [gpfsug-discuss] GPFS API O_NOFOLLOW support >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> ------------------------------------------------------------------------ >> >> >> >> Hi Everyone, >> >> I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in >> particular the putacl and getacl functions) have no support for not >> following symlinks. Is there some hidden support for gpfs_putacl that >> will cause it to not deteference symbolic links? Something like the >> O_NOFOLLOW flag used elsewhere in linux? >> >> Thanks! >> >> -Aaron_______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > [attachment "signature.asc" deleted by Marc A Kaplan/Watson/IBM] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From Luke.Raimbach at crick.ac.uk Tue Jul 26 15:17:35 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Tue, 26 Jul 2016 14:17:35 +0000 Subject: [gpfsug-discuss] AFM Crashing the MDS Message-ID: Hi All, Anyone seen GPFS barf like this before? I'll explain the setup: RO AFM cache on remote site (cache A) for reading remote datasets quickly, LU AFM cache at destination site (cache B) for caching data from cache A (has a local compute cluster mounting this over multi-cluster), IW AFM cache at destination site (cache C) for presenting cache B over NAS protocols, Reading files in cache C should pull data from the remote source through cache A->B->C Modifying files in cache C should pull data into cache B and then break the cache relationship for that file, converting it to a local copy. Those modifications should include metadata updates (e.g. chown). To speed things up we prefetch files into cache B for datasets which are undergoing migration and have entered a read-only state at the source. When issuing chown on a directory in cache C containing ~4.5million files, the MDS for the AFM cache C crashes badly: Tue Jul 26 13:28:52.487 2016: [X] logAssertFailed: addr.isReserved() || addr.getClusterIdx() == clusterIdx Tue Jul 26 13:28:52.488 2016: [X] return code 0, reason code 1, log record tag 0 Tue Jul 26 13:28:53.392 2016: [X] *** Assert exp(addr.isReserved() || addr.getClusterIdx() == clusterIdx) in line 1936 of file /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h Tue Jul 26 13:28:53.393 2016: [E] *** Traceback: Tue Jul 26 13:28:53.394 2016: [E] 2:0x7F6DC95444A6 logAssertFailed + 0x2D6 at ??:0 Tue Jul 26 13:28:53.395 2016: [E] 3:0x7F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 0x4B4 at ??:0 Tue Jul 26 13:28:53.396 2016: [E] 4:0x7F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 0x91 at ??:0 Tue Jul 26 13:28:53.397 2016: [E] 5:0x7F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 0x346 at ??:0 Tue Jul 26 13:28:53.398 2016: [E] 6:0x7F6DC9332494 HandleMBPcache(MBPcacheParms*) + 0xB4 at ??:0 Tue Jul 26 13:28:53.399 2016: [E] 7:0x7F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 0x3C3 at ??:0 Tue Jul 26 13:28:53.400 2016: [E] 8:0x7F6DC908BC06 Thread::callBody(Thread*) + 0x46 at ??:0 Tue Jul 26 13:28:53.401 2016: [E] 9:0x7F6DC907A0D2 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 Tue Jul 26 13:28:53.402 2016: [E] 10:0x7F6DC87A3AA1 start_thread + 0xD1 at ??:0 Tue Jul 26 13:28:53.403 2016: [E] 11:0x7F6DC794A93D clone + 0x6D at ??:0 mmfsd: /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h:1936: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion `addr.isReserved() || addr.getClusterIdx() == clusterIdx' failed. Tue Jul 26 13:28:53.404 2016: [N] Signal 6 at location 0x7F6DC7894625 in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:53.405 2016: [I] rax 0x0000000000000000 rbx 0x00007F6DC8DCB000 Tue Jul 26 13:28:53.406 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx 0x0000000000000006 Tue Jul 26 13:28:53.407 2016: [I] rsp 0x00007F6DAAEA01F8 rbp 0x00007F6DCA05C8B0 Tue Jul 26 13:28:53.408 2016: [I] rsi 0x00000000000018F8 rdi 0x0000000000001876 Tue Jul 26 13:28:53.409 2016: [I] r8 0xFEFEFEFEFEFEFEFF r9 0xFEFEFEFEFF092D63 Tue Jul 26 13:28:53.410 2016: [I] r10 0x0000000000000008 r11 0x0000000000000202 Tue Jul 26 13:28:53.411 2016: [I] r12 0x00007F6DC9FC5540 r13 0x00007F6DCA05C1C0 Tue Jul 26 13:28:53.412 2016: [I] r14 0x0000000000000000 r15 0x0000000000000000 Tue Jul 26 13:28:53.413 2016: [I] rip 0x00007F6DC7894625 eflags 0x0000000000000202 Tue Jul 26 13:28:53.414 2016: [I] csgsfs 0x0000000000000033 err 0x0000000000000000 Tue Jul 26 13:28:53.415 2016: [I] trapno 0x0000000000000000 oldmsk 0x0000000010017807 Tue Jul 26 13:28:53.416 2016: [I] cr2 0x0000000000000000 Tue Jul 26 13:28:54.225 2016: [D] Traceback: Tue Jul 26 13:28:54.226 2016: [D] 0:00007F6DC7894625 raise + 35 at ??:0 Tue Jul 26 13:28:54.227 2016: [D] 1:00007F6DC7895E05 abort + 175 at ??:0 Tue Jul 26 13:28:54.228 2016: [D] 2:00007F6DC788D74E __assert_fail_base + 11E at ??:0 Tue Jul 26 13:28:54.229 2016: [D] 3:00007F6DC788D810 __assert_fail + 50 at ??:0 Tue Jul 26 13:28:54.230 2016: [D] 4:00007F6DC95444CA logAssertFailed + 2FA at ??:0 Tue Jul 26 13:28:54.231 2016: [D] 5:00007F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 4B4 at ??:0 Tue Jul 26 13:28:54.232 2016: [D] 6:00007F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 91 at ??:0 Tue Jul 26 13:28:54.233 2016: [D] 7:00007F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 346 at ??:0 Tue Jul 26 13:28:54.234 2016: [D] 8:00007F6DC9332494 HandleMBPcache(MBPcacheParms*) + B4 at ??:0 Tue Jul 26 13:28:54.235 2016: [D] 9:00007F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 3C3 at ??:0 Tue Jul 26 13:28:54.236 2016: [D] 10:00007F6DC908BC06 Thread::callBody(Thread*) + 46 at ??:0 Tue Jul 26 13:28:54.237 2016: [D] 11:00007F6DC907A0D2 Thread::callBodyWrapper(Thread*) + A2 at ??:0 Tue Jul 26 13:28:54.238 2016: [D] 12:00007F6DC87A3AA1 start_thread + D1 at ??:0 Tue Jul 26 13:28:54.239 2016: [D] 13:00007F6DC794A93D clone + 6D at ??:0 Tue Jul 26 13:28:54.240 2016: [N] Restarting mmsdrserv Tue Jul 26 13:28:55.535 2016: [N] Signal 6 at location 0x7F6DC790EA7D in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:55.536 2016: [N] mmfsd is shutting down. Tue Jul 26 13:28:55.537 2016: [N] Reason for shutdown: Signal handler entered Tue Jul 26 13:28:55 BST 2016: mmcommon mmfsdown invoked. Subsystem: mmfs Status: active Tue Jul 26 13:28:55 BST 2016: /var/mmfs/etc/mmfsdown invoked umount2: Device or resource busy umount: /camp: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy umount: /ingest: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Shutting down NFS daemon: [ OK ] Shutting down NFS mountd: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Shutting down RPC idmapd: [ OK ] Stopping NFS statd: [ OK ] Ugly, right? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From bbanister at jumptrading.com Wed Jul 27 18:37:37 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 27 Jul 2016 17:37:37 +0000 Subject: [gpfsug-discuss] CCR troubles Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I'll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Jul 27 19:03:05 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 27 Jul 2016 14:03:05 -0400 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org)" Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Jul 27 23:29:19 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 27 Jul 2016 22:29:19 +0000 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> Hi Marc, I do understand the principal you describe. The quorum nodes are accessible over TCP/IP but GPFS happens to be down. I think that CCR should would work regardless of whether GPFS is up or down, so that you can change the configuration on a down cluster. I could even imagine a scenario where a config parameter was set incorrectly and prevents GPFS from starting at all. If you have to have GPFS up to make config changes because of CCR then how can you fix this issue? Thanks for the response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Wednesday, July 27, 2016 1:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister > To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org)" > Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmcpheeters at anl.gov Wed Jul 27 23:34:50 2016 From: gmcpheeters at anl.gov (McPheeters, Gordon) Date: Wed, 27 Jul 2016 22:34:50 +0000 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> mmchcluster has an option: ??ccr?disable Reverts to the traditional primary or backup configuration server semantics and destroys the CCR environment. All nodes must be shut down before disabling CCR. -Gordon On Jul 27, 2016, at 5:29 PM, Bryan Banister > wrote: Hi Marc, I do understand the principal you describe. The quorum nodes are accessible over TCP/IP but GPFS happens to be down. I think that CCR should would work regardless of whether GPFS is up or down, so that you can change the configuration on a down cluster. I could even imagine a scenario where a config parameter was set incorrectly and prevents GPFS from starting at all. If you have to have GPFS up to make config changes because of CCR then how can you fix this issue? Thanks for the response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Wednesday, July 27, 2016 1:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister > To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org)" > Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Jul 27 23:44:27 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 27 Jul 2016 22:44:27 +0000 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> Right, I know that I can disable CCR, and I?m asking if this seemingly broken behavior of GPFS commands when the cluster is down was the expected mode of operation with CCR enabled. Sounds like it from the responses thus far. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of McPheeters, Gordon Sent: Wednesday, July 27, 2016 5:35 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles mmchcluster has an option: ??ccr?disable Reverts to the traditional primary or backup configuration server semantics and destroys the CCR environment. All nodes must be shut down before disabling CCR. -Gordon On Jul 27, 2016, at 5:29 PM, Bryan Banister > wrote: Hi Marc, I do understand the principal you describe. The quorum nodes are accessible over TCP/IP but GPFS happens to be down. I think that CCR should would work regardless of whether GPFS is up or down, so that you can change the configuration on a down cluster. I could even imagine a scenario where a config parameter was set incorrectly and prevents GPFS from starting at all. If you have to have GPFS up to make config changes because of CCR then how can you fix this issue? Thanks for the response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Wednesday, July 27, 2016 1:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister > To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org)" > Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsanjay at us.ibm.com Thu Jul 28 00:04:35 2016 From: gsanjay at us.ibm.com (Sanjay Gandhi) Date: Wed, 27 Jul 2016 16:04:35 -0700 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 54, Issue 63 In-Reply-To: References: Message-ID: Check mmsdrserv is running on all quorum nodes. mmlscluster should start mmsdrserv if it is not running. Thanks, Sanjay Gandhi GPFS FVT IBM, Beaverton Phone/FAX : 503-578-4141 T/L 775-4141 gsanjay at us.ibm.com From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 07/27/2016 03:44 PM Subject: gpfsug-discuss Digest, Vol 54, Issue 63 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: CCR troubles (Bryan Banister) ---------------------------------------------------------------------- Message: 1 Date: Wed, 27 Jul 2016 22:44:27 +0000 From: Bryan Banister To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D at CHI-EXCHANGEW1.w2k.jumptrading.com> Content-Type: text/plain; charset="utf-8" Right, I know that I can disable CCR, and I?m asking if this seemingly broken behavior of GPFS commands when the cluster is down was the expected mode of operation with CCR enabled. Sounds like it from the responses thus far. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of McPheeters, Gordon Sent: Wednesday, July 27, 2016 5:35 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles mmchcluster has an option: ??ccr?disable Reverts to the traditional primary or backup configuration server semantics and destroys the CCR environment. All nodes must be shut down before disabling CCR. -Gordon On Jul 27, 2016, at 5:29 PM, Bryan Banister > wrote: Hi Marc, I do understand the principal you describe. The quorum nodes are accessible over TCP/IP but GPFS happens to be down. I think that CCR should would work regardless of whether GPFS is up or down, so that you can change the configuration on a down cluster. I could even imagine a scenario where a config parameter was set incorrectly and prevents GPFS from starting at all. If you have to have GPFS up to make config changes because of CCR then how can you fix this issue? Thanks for the response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Wednesday, July 27, 2016 1:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to display and/or change configuration paramters. You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip) --ccr-enable Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible. Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...) to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database. Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer. (You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you, I don't think CCR guards against Byzantine failures... The minority guy could just be out of touch for a while...) I advise that you do some testing on a test cluster (could be virtual)... From: Bryan Banister > To: "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org< mailto:gpfsug-discuss at spectrumscale.org>)" > Date: 07/27/2016 01:37 PM Subject: [gpfsug-discuss] CCR troubles Sent by: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> ________________________________ When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR: # mmgetstate -aL # Which stalls for a really stupid amount of time and then spits out: get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. And trying to change tuning parameters now also barfs when GPFS is down: # [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. # mmchconfig worker1Threads=128,prefetchThreads=128 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com< http://fpia-gpfs-jcsdr01.grid.jumptrading.com/>. mmchconfig: Command failed. Examine previous error messages to determine cause. Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting. Is this really the new mode of operation for CCR enabled clusters? I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation. If so, then maybe I?ll go back to non CCR, -Bryan ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160727/ea365c46/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 54, Issue 63 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From olaf.weiser at de.ibm.com Thu Jul 28 05:23:34 2016 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 28 Jul 2016 06:23:34 +0200 Subject: [gpfsug-discuss] CCR troubles In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: An HTML attachment was scrubbed... URL: From radhika.p at in.ibm.com Thu Jul 28 06:43:13 2016 From: radhika.p at in.ibm.com (Radhika A Parameswaran) Date: Thu, 28 Jul 2016 11:13:13 +0530 Subject: [gpfsug-discuss] Re. AFM Crashing the MDS In-Reply-To: References: Message-ID: Luke, AFM is not tested for cascading configurations, this is getting added into the documentation for 4.2.1: "Cascading of AFM caches is not tested." Thanks and Regards Radhika From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 07/27/2016 04:30 PM Subject: gpfsug-discuss Digest, Vol 54, Issue 59 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. AFM Crashing the MDS (Luke Raimbach) ---------------------------------------------------------------------- Message: 1 Date: Tue, 26 Jul 2016 14:17:35 +0000 From: Luke Raimbach To: gpfsug main discussion list Subject: [gpfsug-discuss] AFM Crashing the MDS Message-ID: Content-Type: text/plain; charset="utf-8" Hi All, Anyone seen GPFS barf like this before? I'll explain the setup: RO AFM cache on remote site (cache A) for reading remote datasets quickly, LU AFM cache at destination site (cache B) for caching data from cache A (has a local compute cluster mounting this over multi-cluster), IW AFM cache at destination site (cache C) for presenting cache B over NAS protocols, Reading files in cache C should pull data from the remote source through cache A->B->C Modifying files in cache C should pull data into cache B and then break the cache relationship for that file, converting it to a local copy. Those modifications should include metadata updates (e.g. chown). To speed things up we prefetch files into cache B for datasets which are undergoing migration and have entered a read-only state at the source. When issuing chown on a directory in cache C containing ~4.5million files, the MDS for the AFM cache C crashes badly: Tue Jul 26 13:28:52.487 2016: [X] logAssertFailed: addr.isReserved() || addr.getClusterIdx() == clusterIdx Tue Jul 26 13:28:52.488 2016: [X] return code 0, reason code 1, log record tag 0 Tue Jul 26 13:28:53.392 2016: [X] *** Assert exp(addr.isReserved() || addr.getClusterIdx() == clusterIdx) in line 1936 of file /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h Tue Jul 26 13:28:53.393 2016: [E] *** Traceback: Tue Jul 26 13:28:53.394 2016: [E] 2:0x7F6DC95444A6 logAssertFailed + 0x2D6 at ??:0 Tue Jul 26 13:28:53.395 2016: [E] 3:0x7F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 0x4B4 at ??:0 Tue Jul 26 13:28:53.396 2016: [E] 4:0x7F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 0x91 at ??:0 Tue Jul 26 13:28:53.397 2016: [E] 5:0x7F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 0x346 at ??:0 Tue Jul 26 13:28:53.398 2016: [E] 6:0x7F6DC9332494 HandleMBPcache(MBPcacheParms*) + 0xB4 at ??:0 Tue Jul 26 13:28:53.399 2016: [E] 7:0x7F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 0x3C3 at ??:0 Tue Jul 26 13:28:53.400 2016: [E] 8:0x7F6DC908BC06 Thread::callBody(Thread*) + 0x46 at ??:0 Tue Jul 26 13:28:53.401 2016: [E] 9:0x7F6DC907A0D2 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 Tue Jul 26 13:28:53.402 2016: [E] 10:0x7F6DC87A3AA1 start_thread + 0xD1 at ??:0 Tue Jul 26 13:28:53.403 2016: [E] 11:0x7F6DC794A93D clone + 0x6D at ??:0 mmfsd: /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h:1936: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion `addr.isReserved() || addr.getClusterIdx() == clusterIdx' failed. Tue Jul 26 13:28:53.404 2016: [N] Signal 6 at location 0x7F6DC7894625 in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:53.405 2016: [I] rax 0x0000000000000000 rbx 0x00007F6DC8DCB000 Tue Jul 26 13:28:53.406 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx 0x0000000000000006 Tue Jul 26 13:28:53.407 2016: [I] rsp 0x00007F6DAAEA01F8 rbp 0x00007F6DCA05C8B0 Tue Jul 26 13:28:53.408 2016: [I] rsi 0x00000000000018F8 rdi 0x0000000000001876 Tue Jul 26 13:28:53.409 2016: [I] r8 0xFEFEFEFEFEFEFEFF r9 0xFEFEFEFEFF092D63 Tue Jul 26 13:28:53.410 2016: [I] r10 0x0000000000000008 r11 0x0000000000000202 Tue Jul 26 13:28:53.411 2016: [I] r12 0x00007F6DC9FC5540 r13 0x00007F6DCA05C1C0 Tue Jul 26 13:28:53.412 2016: [I] r14 0x0000000000000000 r15 0x0000000000000000 Tue Jul 26 13:28:53.413 2016: [I] rip 0x00007F6DC7894625 eflags 0x0000000000000202 Tue Jul 26 13:28:53.414 2016: [I] csgsfs 0x0000000000000033 err 0x0000000000000000 Tue Jul 26 13:28:53.415 2016: [I] trapno 0x0000000000000000 oldmsk 0x0000000010017807 Tue Jul 26 13:28:53.416 2016: [I] cr2 0x0000000000000000 Tue Jul 26 13:28:54.225 2016: [D] Traceback: Tue Jul 26 13:28:54.226 2016: [D] 0:00007F6DC7894625 raise + 35 at ??:0 Tue Jul 26 13:28:54.227 2016: [D] 1:00007F6DC7895E05 abort + 175 at ??:0 Tue Jul 26 13:28:54.228 2016: [D] 2:00007F6DC788D74E __assert_fail_base + 11E at ??:0 Tue Jul 26 13:28:54.229 2016: [D] 3:00007F6DC788D810 __assert_fail + 50 at ??:0 Tue Jul 26 13:28:54.230 2016: [D] 4:00007F6DC95444CA logAssertFailed + 2FA at ??:0 Tue Jul 26 13:28:54.231 2016: [D] 5:00007F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 4B4 at ??:0 Tue Jul 26 13:28:54.232 2016: [D] 6:00007F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 91 at ??:0 Tue Jul 26 13:28:54.233 2016: [D] 7:00007F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 346 at ??:0 Tue Jul 26 13:28:54.234 2016: [D] 8:00007F6DC9332494 HandleMBPcache(MBPcacheParms*) + B4 at ??:0 Tue Jul 26 13:28:54.235 2016: [D] 9:00007F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 3C3 at ??:0 Tue Jul 26 13:28:54.236 2016: [D] 10:00007F6DC908BC06 Thread::callBody(Thread*) + 46 at ??:0 Tue Jul 26 13:28:54.237 2016: [D] 11:00007F6DC907A0D2 Thread::callBodyWrapper(Thread*) + A2 at ??:0 Tue Jul 26 13:28:54.238 2016: [D] 12:00007F6DC87A3AA1 start_thread + D1 at ??:0 Tue Jul 26 13:28:54.239 2016: [D] 13:00007F6DC794A93D clone + 6D at ??:0 Tue Jul 26 13:28:54.240 2016: [N] Restarting mmsdrserv Tue Jul 26 13:28:55.535 2016: [N] Signal 6 at location 0x7F6DC790EA7D in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:55.536 2016: [N] mmfsd is shutting down. Tue Jul 26 13:28:55.537 2016: [N] Reason for shutdown: Signal handler entered Tue Jul 26 13:28:55 BST 2016: mmcommon mmfsdown invoked. Subsystem: mmfs Status: active Tue Jul 26 13:28:55 BST 2016: /var/mmfs/etc/mmfsdown invoked umount2: Device or resource busy umount: /camp: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy umount: /ingest: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Shutting down NFS daemon: [ OK ] Shutting down NFS mountd: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Shutting down RPC idmapd: [ OK ] Stopping NFS statd: [ OK ] Ugly, right? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 54, Issue 59 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Thu Jul 28 09:30:59 2016 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Thu, 28 Jul 2016 08:30:59 +0000 Subject: [gpfsug-discuss] AFM Crashing the MDS Message-ID: Dear Radhika, In the early days of AFM and at two separate GPFS UK User Group meetings, I discussed AFM cache chaining with IBM technical people plus at least one developer. My distinct recollection of the outcome was that cache chaining was supported. Nevertheless, the difference between what my memory tells me and what is being reported now is irrelevant. We are stuck with large volumes of data being migrated in this fashion, so there is clearly a customer use case for chaining AFM caches. It would be much more helpful if IBM could take on this case and look at the suspected bug that's been chased out here. Real world observation in the field is that queuing large numbers of metadata updates on the MDS itself causes this crash, whereas issuing the updates from another node in the cache cluster adds to the MDS queue and the crash does not happen. My guess is that there is a bug whereby daemon-local additions to the MDS queue aren't handled correctly (further speculation is that there is a memory leak for local MDS operations, but that needs more testing which I don't have time for - perhaps IBM could try it out?); however, when a metadata update operation is sent through an RPC from another node, it is added to the queue and handled correctly. A workaround, if you will. Other minor observations here are that the further down the chain of caches you are, the larger you should set afmDisconnectTimeout as any intermediate cache recovery time needs to be taken into account following a disconnect event. Initially, this was slightly counterintuitive because caches B and C as described below are connected over multiple IB interfaces and shouldn't disconnect except when there's some other failure. Conversely, the connection between cache A and B is over a very flaky wide area network and although we've managed to tune out a lot of the problems introduced by high and variable latency, the view of cache A from cache B's perspective still sometimes gets suspended. The failure observed above doesn't really feel like it's an artefact of cascading caches, but a bug in MDS code as described. Sharing background information about the cascading cache setup was in the spirit of the mailing list and might have led IBM or other customers attempting this kind of setup to share some of their experiences. Hope you can help. Luke. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Radhika A Parameswaran Sent: 28 July 2016 06:43 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Re. AFM Crashing the MDS Luke, AFM is not tested for cascading configurations, this is getting added into the documentation for 4.2.1: "Cascading of AFM caches is not tested." Thanks and Regards Radhika From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 07/27/2016 04:30 PM Subject: gpfsug-discuss Digest, Vol 54, Issue 59 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. AFM Crashing the MDS (Luke Raimbach) ---------------------------------------------------------------------- Message: 1 Date: Tue, 26 Jul 2016 14:17:35 +0000 From: Luke Raimbach > To: gpfsug main discussion list > Subject: [gpfsug-discuss] AFM Crashing the MDS Message-ID: > Content-Type: text/plain; charset="utf-8" Hi All, Anyone seen GPFS barf like this before? I'll explain the setup: RO AFM cache on remote site (cache A) for reading remote datasets quickly, LU AFM cache at destination site (cache B) for caching data from cache A (has a local compute cluster mounting this over multi-cluster), IW AFM cache at destination site (cache C) for presenting cache B over NAS protocols, Reading files in cache C should pull data from the remote source through cache A->B->C Modifying files in cache C should pull data into cache B and then break the cache relationship for that file, converting it to a local copy. Those modifications should include metadata updates (e.g. chown). To speed things up we prefetch files into cache B for datasets which are undergoing migration and have entered a read-only state at the source. When issuing chown on a directory in cache C containing ~4.5million files, the MDS for the AFM cache C crashes badly: Tue Jul 26 13:28:52.487 2016: [X] logAssertFailed: addr.isReserved() || addr.getClusterIdx() == clusterIdx Tue Jul 26 13:28:52.488 2016: [X] return code 0, reason code 1, log record tag 0 Tue Jul 26 13:28:53.392 2016: [X] *** Assert exp(addr.isReserved() || addr.getClusterIdx() == clusterIdx) in line 1936 of file /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h Tue Jul 26 13:28:53.393 2016: [E] *** Traceback: Tue Jul 26 13:28:53.394 2016: [E] 2:0x7F6DC95444A6 logAssertFailed + 0x2D6 at ??:0 Tue Jul 26 13:28:53.395 2016: [E] 3:0x7F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 0x4B4 at ??:0 Tue Jul 26 13:28:53.396 2016: [E] 4:0x7F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 0x91 at ??:0 Tue Jul 26 13:28:53.397 2016: [E] 5:0x7F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 0x346 at ??:0 Tue Jul 26 13:28:53.398 2016: [E] 6:0x7F6DC9332494 HandleMBPcache(MBPcacheParms*) + 0xB4 at ??:0 Tue Jul 26 13:28:53.399 2016: [E] 7:0x7F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 0x3C3 at ??:0 Tue Jul 26 13:28:53.400 2016: [E] 8:0x7F6DC908BC06 Thread::callBody(Thread*) + 0x46 at ??:0 Tue Jul 26 13:28:53.401 2016: [E] 9:0x7F6DC907A0D2 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 Tue Jul 26 13:28:53.402 2016: [E] 10:0x7F6DC87A3AA1 start_thread + 0xD1 at ??:0 Tue Jul 26 13:28:53.403 2016: [E] 11:0x7F6DC794A93D clone + 0x6D at ??:0 mmfsd: /project/sprelbmd0/build/rbmd0s003a/src/avs/fs/mmfs/ts/cfgmgr/cfgmgr.h:1936: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion `addr.isReserved() || addr.getClusterIdx() == clusterIdx' failed. Tue Jul 26 13:28:53.404 2016: [N] Signal 6 at location 0x7F6DC7894625 in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:53.405 2016: [I] rax 0x0000000000000000 rbx 0x00007F6DC8DCB000 Tue Jul 26 13:28:53.406 2016: [I] rcx 0xFFFFFFFFFFFFFFFF rdx 0x0000000000000006 Tue Jul 26 13:28:53.407 2016: [I] rsp 0x00007F6DAAEA01F8 rbp 0x00007F6DCA05C8B0 Tue Jul 26 13:28:53.408 2016: [I] rsi 0x00000000000018F8 rdi 0x0000000000001876 Tue Jul 26 13:28:53.409 2016: [I] r8 0xFEFEFEFEFEFEFEFF r9 0xFEFEFEFEFF092D63 Tue Jul 26 13:28:53.410 2016: [I] r10 0x0000000000000008 r11 0x0000000000000202 Tue Jul 26 13:28:53.411 2016: [I] r12 0x00007F6DC9FC5540 r13 0x00007F6DCA05C1C0 Tue Jul 26 13:28:53.412 2016: [I] r14 0x0000000000000000 r15 0x0000000000000000 Tue Jul 26 13:28:53.413 2016: [I] rip 0x00007F6DC7894625 eflags 0x0000000000000202 Tue Jul 26 13:28:53.414 2016: [I] csgsfs 0x0000000000000033 err 0x0000000000000000 Tue Jul 26 13:28:53.415 2016: [I] trapno 0x0000000000000000 oldmsk 0x0000000010017807 Tue Jul 26 13:28:53.416 2016: [I] cr2 0x0000000000000000 Tue Jul 26 13:28:54.225 2016: [D] Traceback: Tue Jul 26 13:28:54.226 2016: [D] 0:00007F6DC7894625 raise + 35 at ??:0 Tue Jul 26 13:28:54.227 2016: [D] 1:00007F6DC7895E05 abort + 175 at ??:0 Tue Jul 26 13:28:54.228 2016: [D] 2:00007F6DC788D74E __assert_fail_base + 11E at ??:0 Tue Jul 26 13:28:54.229 2016: [D] 3:00007F6DC788D810 __assert_fail + 50 at ??:0 Tue Jul 26 13:28:54.230 2016: [D] 4:00007F6DC95444CA logAssertFailed + 2FA at ??:0 Tue Jul 26 13:28:54.231 2016: [D] 5:00007F6DC95C7EF4 ClusterConfiguration::getGatewayNewHash(DiskUID, unsigned int, NodeAddr*) + 4B4 at ??:0 Tue Jul 26 13:28:54.232 2016: [D] 6:00007F6DC95C8031 ClusterConfiguration::getGatewayNode(DiskUID, unsigned int, NodeAddr, NodeAddr*, unsigned int) + 91 at ??:0 Tue Jul 26 13:28:54.233 2016: [D] 7:00007F6DC9DC7126 SFSPcache(StripeGroup*, FileUID, int, int, void*, int, voidXPtr*, int*) + 346 at ??:0 Tue Jul 26 13:28:54.234 2016: [D] 8:00007F6DC9332494 HandleMBPcache(MBPcacheParms*) + B4 at ??:0 Tue Jul 26 13:28:54.235 2016: [D] 9:00007F6DC90A4A53 Mailbox::msgHandlerBody(void*) + 3C3 at ??:0 Tue Jul 26 13:28:54.236 2016: [D] 10:00007F6DC908BC06 Thread::callBody(Thread*) + 46 at ??:0 Tue Jul 26 13:28:54.237 2016: [D] 11:00007F6DC907A0D2 Thread::callBodyWrapper(Thread*) + A2 at ??:0 Tue Jul 26 13:28:54.238 2016: [D] 12:00007F6DC87A3AA1 start_thread + D1 at ??:0 Tue Jul 26 13:28:54.239 2016: [D] 13:00007F6DC794A93D clone + 6D at ??:0 Tue Jul 26 13:28:54.240 2016: [N] Restarting mmsdrserv Tue Jul 26 13:28:55.535 2016: [N] Signal 6 at location 0x7F6DC790EA7D in process 6262, link reg 0xFFFFFFFFFFFFFFFF. Tue Jul 26 13:28:55.536 2016: [N] mmfsd is shutting down. Tue Jul 26 13:28:55.537 2016: [N] Reason for shutdown: Signal handler entered Tue Jul 26 13:28:55 BST 2016: mmcommon mmfsdown invoked. Subsystem: mmfs Status: active Tue Jul 26 13:28:55 BST 2016: /var/mmfs/etc/mmfsdown invoked umount2: Device or resource busy umount: /camp: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy umount: /ingest: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Shutting down NFS daemon: [ OK ] Shutting down NFS mountd: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Shutting down RPC idmapd: [ OK ] Stopping NFS statd: [ OK ] Ugly, right? Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 54, Issue 59 ********************************************** The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From radhika.p at in.ibm.com Thu Jul 28 16:04:44 2016 From: radhika.p at in.ibm.com (Radhika A Parameswaran) Date: Thu, 28 Jul 2016 20:34:44 +0530 Subject: [gpfsug-discuss] AFM Crashing the MDS In-Reply-To: References: Message-ID: Hi Luke, We are explicitly adding cascading to the 4.2.1 documentation as not tested, as we saw few issues during our in-house testing and the tests are not complete. With specific to this use case, we can give it a try and get back to your personal id. Thanks and Regards Radhika -------------- next part -------------- An HTML attachment was scrubbed... URL: From mimarsh2 at vt.edu Thu Jul 28 16:39:09 2016 From: mimarsh2 at vt.edu (Brian Marshall) Date: Thu, 28 Jul 2016 11:39:09 -0400 Subject: [gpfsug-discuss] GPFS on Broadwell processor Message-ID: All, Is there anything special (BIOS option / kernel option) that needs to be done when running GPFS on a Broadwell powered NSD server? Thank you, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamiedavis at us.ibm.com Thu Jul 28 16:48:06 2016 From: jamiedavis at us.ibm.com (James Davis) Date: Thu, 28 Jul 2016 15:48:06 +0000 Subject: [gpfsug-discuss] GPFS on Broadwell processor In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 28 18:24:52 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 28 Jul 2016 13:24:52 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: Based on experiments on my test cluster, I can assure you that you can list and change GPFS configuration parameters with CCR enabled while GPFS is down. I understand you are having a problem with your cluster, but you are incorrectly disparaging the CCR. In fact you can mmshutdown -a AND kill all GPFS related processes, including mmsdrserv and mmcrmonitor and then issue commands like: mmlscluster, mmlsconfig, mmchconfig Those will work correctly and by-the-way re-start mmsdrserv and mmcrmonitor... (Use command like `ps auxw | grep mm` to find the relevenat processes). But that will not start the main GPFS file manager process mmfsd. GPFS "proper" remains down... For the following commands Linux was "up" on all nodes, but GPFS was shutdown. [root at n2 gpfs-git]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 down 4 n5 down 6 n3 down However if a majority of the quorum nodes can not be obtained, you WILL see a sequence of messages like this, after a noticeable "timeout": (For the following test I had three quorum nodes and did a Linux shutdown on two of them...) [root at n2 gpfs-git]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. mmchconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmgetstate -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it should check! Then re-starting Linux... So I have two of three quorum nodes active, but GPFS still down... ## From n2, login to node n3 that I just rebooted... [root at n2 gpfs-git]# ssh n3 Last login: Thu Jul 28 09:50:53 2016 from n2.frozen ## See if any mm processes are running? ... NOPE! [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep --color=auto mm ## Check the state... notice n4 is powered off... [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Examine the cluster configuration [root at n3 ~]# mmlscluster mmlscluster GPFS cluster information ======================== GPFS cluster name: madagascar.frozen GPFS cluster id: 7399668614468035547 GPFS UID domain: madagascar.frozen Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR GPFS cluster configuration servers: ----------------------------------- Primary server: n2.frozen (not in use) Secondary server: n4.frozen (not in use) Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------- 1 n2.frozen 172.20.0.21 n2.frozen quorum-manager-perfmon 3 n4.frozen 172.20.0.23 n4.frozen quorum-manager-perfmon 4 n5.frozen 172.20.0.24 n5.frozen perfmon 6 n3.frozen 172.20.0.22 n3.frozen quorum-manager-perfmon ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep --color=auto mm ## Now I can mmchconfig ... while GPFS remains down. [root at n3 ~]# mmchconfig worker1Threads=1022 mmchconfig worker1Threads=1022 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation started Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation completed; mmdsh rc=0 [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. [root at n3 ~]# ping -c 1 n4 ping -c 1 n4 PING n4.frozen (172.20.0.23) 56(84) bytes of data. >From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable --- n4.frozen ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms [root at n3 ~]# exit exit logout Connection to n3 closed. [root at n2 gpfs-git]# ps auwx | grep mm root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep --color=auto mm root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py [root at n2 gpfs-git]# -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jul 28 18:57:53 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 28 Jul 2016 17:57:53 +0000 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> I now see that these mmccrmonitor and mmsdrserv daemons are required for the CCR operations to work. This is just not clear in the error output. Even the GPFS 4.2 Problem Determination Guide doesn't have anything explaining the "Not enough CCR quorum nodes available" or "Unexpected error from ccr fget mmsdrfs" error messages. Thus there is no clear direction on how to fix this issue from the command output, the man pages, nor the Admin Guides. [root at fpia-gpfs-jcsdr01 ~]# man -E ascii mmccr No manual entry for mmccr There isn't a help for mmccr either, but at least it does print some usage info: [root at fpia-gpfs-jcsdr01 ~]# mmccr -h Unknown subcommand: '-h'Usage: mmccr subcommand common-options subcommand-options... Subcommands: Setup and Initialization: [snip] I'm still not sure how to start these mmccrmonitor and mmsdrserv daemons without starting GPFS... could you tell me how it would be possible? Thanks for sharing details about how this all works Marc, I do appreciate your response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, July 28, 2016 12:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown Based on experiments on my test cluster, I can assure you that you can list and change GPFS configuration parameters with CCR enabled while GPFS is down. I understand you are having a problem with your cluster, but you are incorrectly disparaging the CCR. In fact you can mmshutdown -a AND kill all GPFS related processes, including mmsdrserv and mmcrmonitor and then issue commands like: mmlscluster, mmlsconfig, mmchconfig Those will work correctly and by-the-way re-start mmsdrserv and mmcrmonitor... (Use command like `ps auxw | grep mm` to find the relevenat processes). But that will not start the main GPFS file manager process mmfsd. GPFS "proper" remains down... For the following commands Linux was "up" on all nodes, but GPFS was shutdown. [root at n2 gpfs-git]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 down 4 n5 down 6 n3 down However if a majority of the quorum nodes can not be obtained, you WILL see a sequence of messages like this, after a noticeable "timeout": (For the following test I had three quorum nodes and did a Linux shutdown on two of them...) [root at n2 gpfs-git]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. mmchconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmgetstate -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it should check! Then re-starting Linux... So I have two of three quorum nodes active, but GPFS still down... ## From n2, login to node n3 that I just rebooted... [root at n2 gpfs-git]# ssh n3 Last login: Thu Jul 28 09:50:53 2016 from n2.frozen ## See if any mm processes are running? ... NOPE! [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep --color=auto mm ## Check the state... notice n4 is powered off... [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Examine the cluster configuration [root at n3 ~]# mmlscluster mmlscluster GPFS cluster information ======================== GPFS cluster name: madagascar.frozen GPFS cluster id: 7399668614468035547 GPFS UID domain: madagascar.frozen Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR GPFS cluster configuration servers: ----------------------------------- Primary server: n2.frozen (not in use) Secondary server: n4.frozen (not in use) Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------- 1 n2.frozen 172.20.0.21 n2.frozen quorum-manager-perfmon 3 n4.frozen 172.20.0.23 n4.frozen quorum-manager-perfmon 4 n5.frozen 172.20.0.24 n5.frozen perfmon 6 n3.frozen 172.20.0.22 n3.frozen quorum-manager-perfmon ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep --color=auto mm ## Now I can mmchconfig ... while GPFS remains down. [root at n3 ~]# mmchconfig worker1Threads=1022 mmchconfig worker1Threads=1022 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation started Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation completed; mmdsh rc=0 [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. [root at n3 ~]# ping -c 1 n4 ping -c 1 n4 PING n4.frozen (172.20.0.23) 56(84) bytes of data. >From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable --- n4.frozen ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms [root at n3 ~]# exit exit logout Connection to n3 closed. [root at n2 gpfs-git]# ps auwx | grep mm root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep --color=auto mm root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py [root at n2 gpfs-git]# ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jul 28 19:14:05 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 28 Jul 2016 18:14:05 +0000 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> Hi Marc, So this issue is actually caused by our Systemd setup. We have fully converted over to Systemd to manage the dependency chain needed for GPFS to start properly and also our scheduling system after that. The issue is that when we shutdown GPFS with Systemd this apparently is causing the mmsdrserv and mmccrmonitor processes to also be killed/term'd, probably because these are started in the same CGROUP as GPFS and Systemd kills all processes in this CGROUP when GPFS is stopped. Not sure how to proceed with safeguarding these daemons from Systemd... and real Systemd support in GPFS is basically non-existent at this point. So my problem is actually a Systemd problem, not a CCR problem! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan Banister Sent: Thursday, July 28, 2016 12:58 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown I now see that these mmccrmonitor and mmsdrserv daemons are required for the CCR operations to work. This is just not clear in the error output. Even the GPFS 4.2 Problem Determination Guide doesn't have anything explaining the "Not enough CCR quorum nodes available" or "Unexpected error from ccr fget mmsdrfs" error messages. Thus there is no clear direction on how to fix this issue from the command output, the man pages, nor the Admin Guides. [root at fpia-gpfs-jcsdr01 ~]# man -E ascii mmccr No manual entry for mmccr There isn't a help for mmccr either, but at least it does print some usage info: [root at fpia-gpfs-jcsdr01 ~]# mmccr -h Unknown subcommand: '-h'Usage: mmccr subcommand common-options subcommand-options... Subcommands: Setup and Initialization: [snip] I'm still not sure how to start these mmccrmonitor and mmsdrserv daemons without starting GPFS... could you tell me how it would be possible? Thanks for sharing details about how this all works Marc, I do appreciate your response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, July 28, 2016 12:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown Based on experiments on my test cluster, I can assure you that you can list and change GPFS configuration parameters with CCR enabled while GPFS is down. I understand you are having a problem with your cluster, but you are incorrectly disparaging the CCR. In fact you can mmshutdown -a AND kill all GPFS related processes, including mmsdrserv and mmcrmonitor and then issue commands like: mmlscluster, mmlsconfig, mmchconfig Those will work correctly and by-the-way re-start mmsdrserv and mmcrmonitor... (Use command like `ps auxw | grep mm` to find the relevenat processes). But that will not start the main GPFS file manager process mmfsd. GPFS "proper" remains down... For the following commands Linux was "up" on all nodes, but GPFS was shutdown. [root at n2 gpfs-git]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 down 4 n5 down 6 n3 down However if a majority of the quorum nodes can not be obtained, you WILL see a sequence of messages like this, after a noticeable "timeout": (For the following test I had three quorum nodes and did a Linux shutdown on two of them...) [root at n2 gpfs-git]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. mmchconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmgetstate -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it should check! Then re-starting Linux... So I have two of three quorum nodes active, but GPFS still down... ## From n2, login to node n3 that I just rebooted... [root at n2 gpfs-git]# ssh n3 Last login: Thu Jul 28 09:50:53 2016 from n2.frozen ## See if any mm processes are running? ... NOPE! [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep --color=auto mm ## Check the state... notice n4 is powered off... [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Examine the cluster configuration [root at n3 ~]# mmlscluster mmlscluster GPFS cluster information ======================== GPFS cluster name: madagascar.frozen GPFS cluster id: 7399668614468035547 GPFS UID domain: madagascar.frozen Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR GPFS cluster configuration servers: ----------------------------------- Primary server: n2.frozen (not in use) Secondary server: n4.frozen (not in use) Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------- 1 n2.frozen 172.20.0.21 n2.frozen quorum-manager-perfmon 3 n4.frozen 172.20.0.23 n4.frozen quorum-manager-perfmon 4 n5.frozen 172.20.0.24 n5.frozen perfmon 6 n3.frozen 172.20.0.22 n3.frozen quorum-manager-perfmon ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep --color=auto mm ## Now I can mmchconfig ... while GPFS remains down. [root at n3 ~]# mmchconfig worker1Threads=1022 mmchconfig worker1Threads=1022 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation started Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation completed; mmdsh rc=0 [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. [root at n3 ~]# ping -c 1 n4 ping -c 1 n4 PING n4.frozen (172.20.0.23) 56(84) bytes of data. >From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable --- n4.frozen ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms [root at n3 ~]# exit exit logout Connection to n3 closed. [root at n2 gpfs-git]# ps auwx | grep mm root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep --color=auto mm root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py [root at n2 gpfs-git]# ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 28 19:23:49 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 28 Jul 2016 14:23:49 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: I think the idea is that you should not need to know the details of how ccr and sdrserv are implemented nor how they work. At this moment, I don't! Literally, I just installed GPFS and defined my system with mmcrcluster and so forth and "it just works". As I wrote, just running mmlscluster or mmlsconfig or similar configuration create, list, change, delete commands should start up ccr and sdrserv under the covers. Okay, now "I hear you" -- it ain't working for you today. Presumably it did a while ago? Let's think about that... Troubleshooting 0,1,2 in order of suspicion... 0. Check that you can ping and ssh from each quorum node to every other quorum node. Q*(Q-1) tests 1. Check that you have plenty of free space in /var on each quorum node. Hmmm... we're not talking huge, but see if /var/mmfs/tmp is filled with junk.... Before and After clearing most of that out I had and have: [root at bog-wifi ~]# du -shk /var/mmfs 84532 /var/mmfs ## clean all big and old files out of /var/mmfs/tmp [root at bog-wifi ~]# du -shk /var/mmfs 9004 /var/mmfs Because we know that /var/mmfs is where GPFS store configuration "stuff" - 2. Check that we have GPFS software correctly installed on each quorum node: rpm -qa gpfs.* | xargs rpm --verify From: Bryan Banister To: gpfsug main discussion list Date: 07/28/2016 01:58 PM Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown Sent by: gpfsug-discuss-bounces at spectrumscale.org I now see that these mmccrmonitor and mmsdrserv daemons are required for the CCR operations to work. This is just not clear in the error output. Even the GPFS 4.2 Problem Determination Guide doesn?t have anything explaining the ?Not enough CCR quorum nodes available? or ?Unexpected error from ccr fget mmsdrfs? error messages. Thus there is no clear direction on how to fix this issue from the command output, the man pages, nor the Admin Guides. [root at fpia-gpfs-jcsdr01 ~]# man -E ascii mmccr No manual entry for mmccr There isn?t a help for mmccr either, but at least it does print some usage info: [root at fpia-gpfs-jcsdr01 ~]# mmccr -h Unknown subcommand: '-h'Usage: mmccr subcommand common-options subcommand-options... Subcommands: Setup and Initialization: [snip] I?m still not sure how to start these mmccrmonitor and mmsdrserv daemons without starting GPFS? could you tell me how it would be possible? Thanks for sharing details about how this all works Marc, I do appreciate your response! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, July 28, 2016 12:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown Based on experiments on my test cluster, I can assure you that you can list and change GPFS configuration parameters with CCR enabled while GPFS is down. I understand you are having a problem with your cluster, but you are incorrectly disparaging the CCR. In fact you can mmshutdown -a AND kill all GPFS related processes, including mmsdrserv and mmcrmonitor and then issue commands like: mmlscluster, mmlsconfig, mmchconfig Those will work correctly and by-the-way re-start mmsdrserv and mmcrmonitor... (Use command like `ps auxw | grep mm` to find the relevenat processes). But that will not start the main GPFS file manager process mmfsd. GPFS "proper" remains down... For the following commands Linux was "up" on all nodes, but GPFS was shutdown. [root at n2 gpfs-git]# mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 down 4 n5 down 6 n3 down However if a majority of the quorum nodes can not be obtained, you WILL see a sequence of messages like this, after a noticeable "timeout": (For the following test I had three quorum nodes and did a Linux shutdown on two of them...) [root at n2 gpfs-git]# mmlsconfig get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmlsconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 mmchconfig: Unable to obtain the GPFS configuration file lock. mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. mmchconfig: Command failed. Examine previous error messages to determine cause. [root at n2 gpfs-git]# mmgetstate -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmgetstate: Command failed. Examine previous error messages to determine cause. HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it should check! Then re-starting Linux... So I have two of three quorum nodes active, but GPFS still down... ## From n2, login to node n3 that I just rebooted... [root at n2 gpfs-git]# ssh n3 Last login: Thu Jul 28 09:50:53 2016 from n2.frozen ## See if any mm processes are running? ... NOPE! [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep --color=auto mm ## Check the state... notice n4 is powered off... [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Examine the cluster configuration [root at n3 ~]# mmlscluster mmlscluster GPFS cluster information ======================== GPFS cluster name: madagascar.frozen GPFS cluster id: 7399668614468035547 GPFS UID domain: madagascar.frozen Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR GPFS cluster configuration servers: ----------------------------------- Primary server: n2.frozen (not in use) Secondary server: n4.frozen (not in use) Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------- 1 n2.frozen 172.20.0.21 n2.frozen quorum-manager-perfmon 3 n4.frozen 172.20.0.23 n4.frozen quorum-manager-perfmon 4 n5.frozen 172.20.0.24 n5.frozen perfmon 6 n3.frozen 172.20.0.22 n3.frozen quorum-manager-perfmon ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd [root at n3 ~]# ps auxw | grep mm ps auxw | grep mm root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep --color=auto mm ## Now I can mmchconfig ... while GPFS remains down. [root at n3 ~]# mmchconfig worker1Threads=1022 mmchconfig worker1Threads=1022 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation started Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation completed; mmdsh rc=0 [root at n3 ~]# mmgetstate -a mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 n2 down 3 n4 unknown 4 n5 down 6 n3 down ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. [root at n3 ~]# ping -c 1 n4 ping -c 1 n4 PING n4.frozen (172.20.0.23) 56(84) bytes of data. From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable --- n4.frozen ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms [root at n3 ~]# exit exit logout Connection to n3 closed. [root at n2 gpfs-git]# ps auwx | grep mm root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep --color=auto mm root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py [root at n2 gpfs-git]# Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From oehmes at gmail.com Thu Jul 28 19:27:20 2016 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 28 Jul 2016 11:27:20 -0700 Subject: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig commands fine with mmshutdown In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: they should get started as soon as you shutdown via mmshutdown could you check a node where the processes are NOT started and simply run mmshutdown on this node to see if they get started ? On Thu, Jul 28, 2016 at 10:57 AM, Bryan Banister wrote: > I now see that these mmccrmonitor and mmsdrserv daemons are required for > the CCR operations to work. This is just not clear in the error output. > Even the GPFS 4.2 Problem Determination Guide doesn?t have anything > explaining the ?Not enough CCR quorum nodes available? or ?Unexpected error > from ccr fget mmsdrfs? error messages. Thus there is no clear direction on > how to fix this issue from the command output, the man pages, nor the Admin > Guides. > > > > [root at fpia-gpfs-jcsdr01 ~]# man -E ascii mmccr > > No manual entry for mmccr > > > > There isn?t a help for mmccr either, but at least it does print some usage > info: > > > > [root at fpia-gpfs-jcsdr01 ~]# mmccr -h > > Unknown subcommand: '-h'Usage: mmccr subcommand common-options > subcommand-options... > > > > Subcommands: > > > > Setup and Initialization: > > [snip] > > > > I?m still not sure how to start these mmccrmonitor and mmsdrserv daemons > without starting GPFS? could you tell me how it would be possible? > > > > Thanks for sharing details about how this all works Marc, I do appreciate > your response! > > -Bryan > > > > *From:* gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] *On Behalf Of *Marc A Kaplan > *Sent:* Thursday, July 28, 2016 12:25 PM > *To:* gpfsug main discussion list > *Subject:* Re: [gpfsug-discuss] CCR troubles - CCR and mmXXconfig > commands fine with mmshutdown > > > > Based on experiments on my test cluster, I can assure you that you can > list and change GPFS configuration parameters with CCR enabled while GPFS > is down. > > I understand you are having a problem with your cluster, but you are > incorrectly disparaging the CCR. > > In fact you can mmshutdown -a AND kill all GPFS related processes, > including mmsdrserv and mmcrmonitor and then issue commands like: > > mmlscluster, mmlsconfig, mmchconfig > > Those will work correctly and by-the-way re-start mmsdrserv and > mmcrmonitor... > (Use command like `ps auxw | grep mm` to find the relevenat processes). > > But that will not start the main GPFS file manager process mmfsd. GPFS > "proper" remains down... > > For the following commands Linux was "up" on all nodes, but GPFS was > shutdown. > [root at n2 gpfs-git]# mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 n2 down > 3 n4 down > 4 n5 down > 6 n3 down > > However if a majority of the quorum nodes can not be obtained, you WILL > see a sequence of messages like this, after a noticeable "timeout": > (For the following test I had three quorum nodes and did a Linux shutdown > on two of them...) > > [root at n2 gpfs-git]# mmlsconfig > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmlsconfig: Command failed. Examine previous error messages to determine > cause. > > [root at n2 gpfs-git]# mmchconfig worker1Threads=1022 > mmchconfig: Unable to obtain the GPFS configuration file lock. > mmchconfig: GPFS was unable to obtain a lock from node n2.frozen. > mmchconfig: Command failed. Examine previous error messages to determine > cause. > > [root at n2 gpfs-git]# mmgetstate -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmgetstate: Command failed. Examine previous error messages to determine > cause. > > HMMMM.... notice mmgetstate needs a quorum even to "know" what nodes it > should check! > > Then re-starting Linux... So I have two of three quorum nodes active, but > GPFS still down... > > ## From n2, login to node n3 that I just rebooted... > [root at n2 gpfs-git]# ssh n3 > Last login: Thu Jul 28 09:50:53 2016 from n2.frozen > > ## See if any mm processes are running? ... NOPE! > > [root at n3 ~]# ps auxw | grep mm > ps auxw | grep mm > root 3834 0.0 0.0 112640 972 pts/0 S+ 10:12 0:00 grep > --color=auto mm > > ## Check the state... notice n4 is powered off... > [root at n3 ~]# mmgetstate -a > mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 n2 down > 3 n4 unknown > 4 n5 down > 6 n3 down > > ## Examine the cluster configuration > [root at n3 ~]# mmlscluster > mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: madagascar.frozen > GPFS cluster id: 7399668614468035547 > GPFS UID domain: madagascar.frozen > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > Repository type: CCR > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: n2.frozen (not in use) > Secondary server: n4.frozen (not in use) > > Node Daemon node name IP address Admin node name Designation > ------------------------------------------------------------------- > 1 n2.frozen 172.20.0.21 n2.frozen > quorum-manager-perfmon > 3 n4.frozen 172.20.0.23 n4.frozen > quorum-manager-perfmon > 4 n5.frozen 172.20.0.24 n5.frozen perfmon > 6 n3.frozen 172.20.0.22 n3.frozen > quorum-manager-perfmon > > ## notice that mmccrmonitor and mmsdrserv are running but not mmfsd > > [root at n3 ~]# ps auxw | grep mm > ps auxw | grep mm > root 3882 0.0 0.0 114376 1720 pts/0 S 10:13 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 3954 0.0 0.0 491244 13040 ? Ssl 10:13 0:00 > /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes > root 4339 0.0 0.0 114376 796 pts/0 S 10:15 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 4345 0.0 0.0 112640 972 pts/0 S+ 10:16 0:00 grep > --color=auto mm > > ## Now I can mmchconfig ... while GPFS remains down. > > [root at n3 ~]# mmchconfig worker1Threads=1022 > mmchconfig worker1Threads=1022 > mmchconfig: Command successfully completed > mmchconfig: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > [root at n3 ~]# Thu Jul 28 10:18:16 PDT 2016: mmcommon pushSdr_async: > mmsdrfs propagation started > Thu Jul 28 10:18:21 PDT 2016: mmcommon pushSdr_async: mmsdrfs propagation > completed; mmdsh rc=0 > > [root at n3 ~]# mmgetstate -a > mmgetstate -a > > Node number Node name GPFS state > ------------------------------------------ > 1 n2 down > 3 n4 unknown > 4 n5 down > 6 n3 down > > ## Quorum node n4 remains unreachable... But n2 and n3 are running Linux. > [root at n3 ~]# ping -c 1 n4 > ping -c 1 n4 > PING n4.frozen (172.20.0.23) 56(84) bytes of data. > From n3.frozen (172.20.0.22) icmp_seq=1 Destination Host Unreachable > > --- n4.frozen ping statistics --- > 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms > > [root at n3 ~]# exit > exit > logout > Connection to n3 closed. > [root at n2 gpfs-git]# ps auwx | grep mm > root 3264 0.0 0.0 114376 812 pts/1 S 10:21 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 3271 0.0 0.0 112640 980 pts/1 S+ 10:21 0:00 grep > --color=auto mm > root 31820 0.0 0.0 114376 1728 pts/1 S 09:42 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 32058 0.0 0.0 493264 12000 ? Ssl 09:42 0:00 > /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 > root 32263 0.0 0.0 1700732 17600 ? Sl 09:42 0:00 python > /usr/lpp/mmfs/bin/mmsysmon.py > [root at n2 gpfs-git]# > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 28 19:39:48 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 28 Jul 2016 14:39:48 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR vs systemd In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: My experiments show that any of the mmXXX commands that require ccr will start ccr and sdrserv. So unless you have a daeamon actively seeking and killing ccr, I don't see why systemd is a problem. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Jul 28 19:44:28 2016 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 28 Jul 2016 18:44:28 +0000 Subject: [gpfsug-discuss] CCR troubles - CCR vs systemd In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB0629615F@CHI-EXCHANGEW1.w2k.jumptrading.com> Yeah, not sure why yet but when I shutdown the cluster using our Systemd configuration this kills the daemons, but mmshutdown obviously doesn't. I'll dig into my problems with that. Cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, July 28, 2016 1:40 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CCR troubles - CCR vs systemd My experiments show that any of the mmXXX commands that require ccr will start ccr and sdrserv. So unless you have a daeamon actively seeking and killing ccr, I don't see why systemd is a problem. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jul 28 21:16:30 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 28 Jul 2016 16:16:30 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR vs systemd In-Reply-To: <21BC488F0AEA2245B2C3E83FC0B33DBB0629615F@CHI-EXCHANGEW1.w2k.jumptrading.com> References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629615F@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: Allow me to restate and demonstrate: Even if systemd or any explicit kill signals destroy any/all running mmcr* and mmsdr* processes, simply running mmlsconfig will fire up new mmcr* and mmsdr* processes. For example: ## I used kill -9 to kill all mmccr, mmsdr, lxtrace, ... processes [root at n2 gpfs-git]# ps auwx | grep mm root 9891 0.0 0.0 112640 980 pts/1 S+ 12:57 0:00 grep --color=auto mm [root at n2 gpfs-git]# mmlsconfig Configuration data for cluster madagascar.frozen: ------------------------------------------------- clusterName madagascar.frozen ... worker1Threads 1022 adminMode central File systems in cluster madagascar.frozen: ------------------------------------------ /dev/mak /dev/x1 /dev/yy /dev/zz ## mmlsconfig "needs" ccr and sdrserv, so if it doesn't see them, it restarts them! [root at n2 gpfs-git]# ps auwx | grep mm root 9929 0.0 0.0 114376 1696 pts/1 S 12:58 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 10110 0.0 0.0 20536 128 ? Ss 12:58 0:00 /usr/lpp/mmfs/bin/lxtrace-3.10.0-123.el7.x86_64 on /tmp/mmfs/lxtrac root 10125 0.0 0.0 493264 11064 ? Ssl 12:58 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 root 10358 0.0 0.0 1700488 17636 ? Sl 12:58 0:00 python /usr/lpp/mmfs/bin/mmsysmon.py root 10440 0.0 0.0 114376 804 pts/1 S 12:59 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 root 10442 0.0 0.0 112640 976 pts/1 S+ 12:59 0:00 grep --color=auto mm -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Jul 28 22:29:22 2016 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 28 Jul 2016 17:29:22 -0400 Subject: [gpfsug-discuss] CCR troubles - CCR vs systemd In-Reply-To: References: <21BC488F0AEA2245B2C3E83FC0B33DBB06291E06@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629356D@CHI-EXCHANGEW1.w2k.jumptrading.com> <2E0972E4-BB38-43CA-9613-DF8149A35098@anl.gov> <21BC488F0AEA2245B2C3E83FC0B33DBB0629373D@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295D34@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB06295FBD@CHI-EXCHANGEW1.w2k.jumptrading.com> <21BC488F0AEA2245B2C3E83FC0B33DBB0629615F@CHI-EXCHANGEW1.w2k.jumptrading.com> Message-ID: <83afcc2a-699a-d0b8-4f89-5e9dd7d3370e@nasa.gov> Hi Marc, I've seen systemd be overly helpful (read: not at all helpful) when it observes state changing outside of its control. There was a bug I encountered with GPFS (although the real issue may have been systemd, but the fix was put into GPFS) by which GPFS filesystems would get unmounted a split second after they were mounted, by systemd. The fs would mount but systemd decided the /dev/$fs device wasn't "ready" so it helpfully unmounted the filesystem. I don't know much about systemd (avoiding it) but based on my experience with it I could certainly see a case where systemd may actively kill the sdrserv process shortly after it's started by the mm* commands if systemd doesn't expect it to be running. I'd be curious to see the output of /var/adm/ras/mmsdrserv.log from the manager nodes to see if sdrserv is indeed starting but getting harpooned by systemd. -Aaron On 7/28/16 4:16 PM, Marc A Kaplan wrote: > Allow me to restate and demonstrate: > > Even if systemd or any explicit kill signals destroy any/all running > mmcr* and mmsdr* processes, > > simply running mmlsconfig will fire up new mmcr* and mmsdr* processes. > For example: > > ## I used kill -9 to kill all mmccr, mmsdr, lxtrace, ... processes > > [root at n2 gpfs-git]# ps auwx | grep mm > root 9891 0.0 0.0 112640 980 pts/1 S+ 12:57 0:00 grep > --color=auto mm > > [root at n2 gpfs-git]# mmlsconfig > Configuration data for cluster madagascar.frozen: > ------------------------------------------------- > clusterName madagascar.frozen > ... > worker1Threads 1022 > adminMode central > > File systems in cluster madagascar.frozen: > ------------------------------------------ > /dev/mak > /dev/x1 > /dev/yy > /dev/zz > > ## mmlsconfig "needs" ccr and sdrserv, so if it doesn't see them, it > restarts them! > > [root at n2 gpfs-git]# ps auwx | grep mm > root 9929 0.0 0.0 114376 1696 pts/1 S 12:58 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 10110 0.0 0.0 20536 128 ? Ss 12:58 0:00 > /usr/lpp/mmfs/bin/lxtrace-3.10.0-123.el7.x86_64 on /tmp/mmfs/lxtrac > root 10125 0.0 0.0 493264 11064 ? Ssl 12:58 0:00 > /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1 > root 10358 0.0 0.0 1700488 17636 ? Sl 12:58 0:00 python > /usr/lpp/mmfs/bin/mmsysmon.py > root 10440 0.0 0.0 114376 804 pts/1 S 12:59 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > root 10442 0.0 0.0 112640 976 pts/1 S+ 12:59 0:00 grep > --color=auto mm > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From Kevin.Buterbaugh at Vanderbilt.Edu Fri Jul 29 16:56:14 2016 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 29 Jul 2016 15:56:14 +0000 Subject: [gpfsug-discuss] mmchqos and already running maintenance commands Message-ID: <732CF329-476A-4C96-A3CA-6E35620A548E@vanderbilt.edu> Hi All, Looking for a little clarification here ? in the man page for mmchqos I see: * When you change allocations or mount the file system, a brief delay due to reconfiguration occurs before QoS starts applying allocations. If I?m already running a maintenance command and then I run an mmchqos does that mean that the already running maitenance command will adjust to the new settings or does this only apply to subsequently executed maintenance commands? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jul 29 18:18:22 2016 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 29 Jul 2016 17:18:22 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2.1 Released Message-ID: <5E104D88-1A80-4FF2-B721-D0BF4B930CCE@nuance.com> Version 4.2.1 is out on Fix Central and has a bunch of new features and improvements, many of which have been discussed at recent user group meetings. What's new: http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.1/com.ibm.spectrum.scale.v4r21.doc/bl1xx_soc.htm Bob Oesterlin Sr Storage Engineer, Nuance HPC Grid 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Jul 29 18:57:31 2016 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 29 Jul 2016 13:57:31 -0400 Subject: [gpfsug-discuss] mmchqos and already running maintenance commands In-Reply-To: <732CF329-476A-4C96-A3CA-6E35620A548E@vanderbilt.edu> References: <732CF329-476A-4C96-A3CA-6E35620A548E@vanderbilt.edu> Message-ID: mmchqos fs --enable ... maintenance=1234iops ... Will apply the new settings to all currently running and future maintenance commands. There is just a brief delay (I think it is well under 30 seconds) for the new settings to be propagated and become effective on each node. You can use `mmlsqos fs --seconds 70` to observe performance. Better, install gnuplot and run samples/charts/qosplot.pl or hack the script to push the data into your favorite plotter. --marc From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 07/29/2016 11:57 AM Subject: [gpfsug-discuss] mmchqos and already running maintenance commands Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Looking for a little clarification here ? in the man page for mmchqos I see: * When you change allocations or mount the file system, a brief delay due to reconfiguration occurs before QoS starts applying allocations. If I?m already running a maintenance command and then I run an mmchqos does that mean that the already running maitenance command will adjust to the new settings or does this only apply to subsequently executed maintenance commands? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: