From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 17:55:04 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 16:55:04 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Message-ID: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Aug 1 18:21:01 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 1 Aug 2018 13:21:01 -0400 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+????? ??????????????????????????+ | Block size | Subblock size | +???????????????????????????????+????? ??????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+????? ??????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+????? ??????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+????? ??????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+????? ??????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Aug 1 19:21:28 2018 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 1 Aug 2018 14:21:28 -0400 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: Marc, Kevin, We'll be looking into this issue, since at least at a first glance, it does look odd. A 4MB block size should have resulted in an 8KB subblock size. I suspect that, somehow, the --metadata-block-size 1M may have resulted in 32768 Minimum fragment (subblock) size in bytes (other pools) but I do not yet understand how. The subblocks-per-full-block parameter is not supported with mmcrfs . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Marc A Kaplan" To: gpfsug main discussion list Date: 08/01/2018 01:21 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 19:08:08 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 18:08:08 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: <2E17AB2D-AC59-4A36-A6D8-235C2C2439C3@vanderbilt.edu> Hi Marc, Thanks for the response ? I understand what you?re saying, but since I?m asking for a 1 MB block size for metadata and a 4 MB block size for data and according to the chart in the mmcrfs man page both result in an 8 KB sub block size I?m still confused as to why I?ve got a 32 KB sub block size for my non-system (i.e. data) pools? Especially when you consider that 32 KB isn?t the default even if I had chosen an 8 or 16 MB block size! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 12:21 PM, Marc A Kaplan > wrote: I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd84fdde05c65406d4d9008d5f7d32f0f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687408760535040&sdata=hqVZVIQLbxakARTspzbSkMZBHi2b6%2BIcrPLU1atNbus%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Aug 1 19:41:05 2018 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 1 Aug 2018 11:41:05 -0700 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: the number of subblocks is derived by the smallest blocksize in any pool of a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in the metadata pool, but 4 x of that in the data pool if your data pool is 4M. sven On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop wrote: > Marc, Kevin, > > We'll be looking into this issue, since at least at a first glance, it > does look odd. A 4MB block size should have resulted in an 8KB subblock > size. I suspect that, somehow, the *--metadata-block-size** 1M* may have > resulted in > > > 32768 Minimum fragment (subblock) size in bytes (other pools) > > but I do not yet understand how. > > The *subblocks-per-full-block* parameter is not supported with *mmcrfs *. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: graycol.gif]"Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't > looked into all the details but here's a clue -- notice there is only one > "subblocks-per- > > From: "Marc A Kaplan" > > > To: gpfsug main discussion list > > Date: 08/01/2018 01:21 PM > Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? > > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > I haven't looked into all the details but here's a clue -- notice there is > only one "subblocks-per-full-block" parameter. > > And it is the same for both metadata blocks and datadata blocks. > > So maybe (MAYBE) that is a constraint somewhere... > > Certainly, in the currently supported code, that's what you get. > > > > > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM > Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi All, > > Our production cluster is still on GPFS 4.2.3.x, but in preparation for > moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS > 5.0.1-1. I am setting up a new filesystem there using hardware that we > recently life-cycled out of our production environment. > > I ?successfully? created a filesystem but I believe the sub-block size is > wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs > man page the sub-block size should be 8K: > > Table 1. Block sizes and subblock sizes > > +???????????????????????????????+???????????????????????????????+ > | Block size | Subblock size | > +???????????????????????????????+???????????????????????????????+ > | 64 KiB | 2 KiB | > +???????????????????????????????+???????????????????????????????+ > | 128 KiB | 4 KiB | > +???????????????????????????????+???????????????????????????????+ > | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | > | MiB, 4 MiB | | > +???????????????????????????????+???????????????????????????????+ > | 8 MiB, 16 MiB | 16 KiB | > +???????????????????????????????+???????????????????????????????+ > > However, it appears that it?s 8K for the system pool but 32K for the other > pools: > > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 8192 Minimum fragment (subblock) size in bytes (system pool) > 32768 Minimum fragment (subblock) size in bytes (other pools) > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 2 Default number of metadata replicas > -M 3 Maximum number of metadata replicas > -r 1 Default number of data replicas > -R 3 Maximum number of data replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in effect > -k all ACL semantics in effect > -n 32 Estimated number of nodes that will mount file system > -B 1048576 Block size (system pool) > 4194304 Block size (other pools) > -Q user;group;fileset Quotas accounting enabled > user;group;fileset Quotas enforced > none Default quotas enabled > --perfileset-quota No Per-fileset quota enforcement > --filesetdf No Fileset df enabled? > -V 19.01 (5.0.1.0) File system version > --create-time Wed Aug 1 11:39:39 2018 File system creation time > -z No Is DMAPI enabled? > -L 33554432 Logfile size > -E Yes Exact mtime mount option > -S relatime Suppress atime mount option > -K whenpossible Strict replica allocation option > --fastea Yes Fast external attributes enabled? > --encryption No Encryption enabled? > --inode-limit 101095424 Maximum number of inodes > --log-replicas 0 Number of log replicas > --is4KAligned Yes is4KAligned? > --rapid-repair Yes rapidRepair enabled? > --write-cache-threshold 0 HAWC Threshold (max 65536) > --subblocks-per-full-block 128 Number of subblocks per full block > -P system;raid1;raid6 Disk storage pools in file system > --file-audit-log No File Audit Logging enabled? > --maintenance-mode No Maintenance Mode enabled? > -d > test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd > Disks in file system > -A yes Automatic mount option > -o none Additional mount options > -T /gpfs5 Default mount point > --mount-priority 0 Mount priority > > Output of mmcrfs: > > mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter > -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes > --nofilesetdf --metadata-block-size 1M > > The following disks of gpfs5 will be formatted on node testnsd3: > test21A3nsd: size 953609 MB > test21A4nsd: size 953609 MB > test21B3nsd: size 953609 MB > test21B4nsd: size 953609 MB > test23Ansd: size 15259744 MB > test23Bnsd: size 15259744 MB > test23Cnsd: size 1907468 MB > test24Ansd: size 15259744 MB > test24Bnsd: size 15259744 MB > test24Cnsd: size 1907468 MB > test25Ansd: size 15259744 MB > test25Bnsd: size 15259744 MB > test25Cnsd: size 1907468 MB > Formatting file system ... > Disks up to size 8.29 TB can be added to storage pool system. > Disks up to size 16.60 TB can be added to storage pool raid1. > Disks up to size 132.62 TB can be added to storage pool raid6. > Creating Inode File > 8 % complete on Wed Aug 1 11:39:19 2018 > 18 % complete on Wed Aug 1 11:39:24 2018 > 27 % complete on Wed Aug 1 11:39:29 2018 > 37 % complete on Wed Aug 1 11:39:34 2018 > 48 % complete on Wed Aug 1 11:39:39 2018 > 60 % complete on Wed Aug 1 11:39:44 2018 > 72 % complete on Wed Aug 1 11:39:49 2018 > 83 % complete on Wed Aug 1 11:39:54 2018 > 95 % complete on Wed Aug 1 11:39:59 2018 > 100 % complete on Wed Aug 1 11:40:01 2018 > Creating Allocation Maps > Creating Log Files > 3 % complete on Wed Aug 1 11:40:07 2018 > 28 % complete on Wed Aug 1 11:40:14 2018 > 53 % complete on Wed Aug 1 11:40:19 2018 > 78 % complete on Wed Aug 1 11:40:24 2018 > 100 % complete on Wed Aug 1 11:40:25 2018 > Clearing Inode Allocation Map > Clearing Block Allocation Map > Formatting Allocation Map for storage pool system > 85 % complete on Wed Aug 1 11:40:32 2018 > 100 % complete on Wed Aug 1 11:40:33 2018 > Formatting Allocation Map for storage pool raid1 > 53 % complete on Wed Aug 1 11:40:38 2018 > 100 % complete on Wed Aug 1 11:40:42 2018 > Formatting Allocation Map for storage pool raid6 > 20 % complete on Wed Aug 1 11:40:47 2018 > 39 % complete on Wed Aug 1 11:40:52 2018 > 60 % complete on Wed Aug 1 11:40:57 2018 > 79 % complete on Wed Aug 1 11:41:02 2018 > 100 % complete on Wed Aug 1 11:41:08 2018 > Completed creation of file system /dev/gpfs5. > mmcrfs: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > > And contents of stanza file: > > %nsd: > nsd=test21A3nsd > usage=metadataOnly > failureGroup=210 > pool=system > servers=testnsd3,testnsd1,testnsd2 > device=dm-15 > > %nsd: > nsd=test21A4nsd > usage=metadataOnly > failureGroup=210 > pool=system > servers=testnsd1,testnsd2,testnsd3 > device=dm-14 > > %nsd: > nsd=test21B3nsd > usage=metadataOnly > failureGroup=211 > pool=system > servers=testnsd1,testnsd2,testnsd3 > device=dm-17 > > %nsd: > nsd=test21B4nsd > usage=metadataOnly > failureGroup=211 > pool=system > servers=testnsd2,testnsd3,testnsd1 > device=dm-16 > > %nsd: > nsd=test23Ansd > usage=dataOnly > failureGroup=23 > pool=raid6 > servers=testnsd2,testnsd3,testnsd1 > device=dm-10 > > %nsd: > nsd=test23Bnsd > usage=dataOnly > failureGroup=23 > pool=raid6 > servers=testnsd3,testnsd1,testnsd2 > device=dm-9 > > %nsd: > nsd=test23Cnsd > usage=dataOnly > failureGroup=23 > pool=raid1 > servers=testnsd1,testnsd2,testnsd3 > device=dm-5 > > %nsd: > nsd=test24Ansd > usage=dataOnly > failureGroup=24 > pool=raid6 > servers=testnsd3,testnsd1,testnsd2 > device=dm-6 > > %nsd: > nsd=test24Bnsd > usage=dataOnly > failureGroup=24 > pool=raid6 > servers=testnsd1,testnsd2,testnsd3 > device=dm-0 > > %nsd: > nsd=test24Cnsd > usage=dataOnly > failureGroup=24 > pool=raid1 > servers=testnsd2,testnsd3,testnsd1 > device=dm-2 > > %nsd: > nsd=test25Ansd > usage=dataOnly > failureGroup=25 > pool=raid6 > servers=testnsd1,testnsd2,testnsd3 > device=dm-6 > > %nsd: > nsd=test25Bnsd > usage=dataOnly > failureGroup=25 > pool=raid6 > servers=testnsd2,testnsd3,testnsd1 > device=dm-6 > > %nsd: > nsd=test25Cnsd > usage=dataOnly > failureGroup=25 > pool=raid1 > servers=testnsd3,testnsd1,testnsd2 > device=dm-3 > > %pool: > pool=system > blockSize=1M > usage=metadataOnly > layoutMap=scatter > allowWriteAffinity=no > > %pool: > pool=raid6 > blockSize=4M > usage=dataOnly > layoutMap=scatter > allowWriteAffinity=no > > %pool: > pool=raid1 > blockSize=4M > usage=dataOnly > layoutMap=scatter > allowWriteAffinity=no > > What am I missing or what have I done wrong? Thanks? > > Kevin > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > *Kevin.Buterbaugh at vanderbilt.edu* - > (615)875-9633 <(615)%20875-9633> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Wed Aug 1 19:47:31 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 1 Aug 2018 14:47:31 -0400 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: <2E17AB2D-AC59-4A36-A6D8-235C2C2439C3@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <2E17AB2D-AC59-4A36-A6D8-235C2C2439C3@vanderbilt.edu> Message-ID: I guess that particular table is not the whole truth, nor a specification, nor a promise, but a simplified summary of what you get when there is just one block size that applies to both meta-data and data-data. You have discovered that it does not apply to systems where metadata has a different blocksize than data-data. My guesstimate (speculation!) is that the deployed code chooses one subblocks-per-full-block parameter and applies that to both. Which would explain the results we're seeing. Further is seems the the mmlsfs command assumes at least in some places that there is only one subblocks-per-block parameter... Looking deeper into code, is another story for another day -- but I'll say that there seems to be sufficient flexibility that if this were deemed a burning issue, there could be futher "enhancements..." ;-) From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/01/2018 02:24 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Marc, Thanks for the response ? I understand what you?re saying, but since I?m asking for a 1 MB block size for metadata and a 4 MB block size for data and according to the chart in the mmcrfs man page both result in an 8 KB sub block size I?m still confused as to why I?ve got a 32 KB sub block size for my non-system (i.e. data) pools? Especially when you consider that 32 KB isn?t the default even if I had chosen an 8 or 16 MB block size! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 12:21 PM, Marc A Kaplan wrote: I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+????? ??????????????????????????+ | Block size | Subblock size | +???????????????????????????????+????? ??????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+????? ??????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+????? ??????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+????? ??????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+????? ??????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd84fdde05c65406d4d9008d5f7d32f0f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687408760535040&sdata=hqVZVIQLbxakARTspzbSkMZBHi2b6%2BIcrPLU1atNbus%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 19:52:37 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 18:52:37 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> All, Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB ? so does this go back to what Marc is saying that there?s really only one sub blocks per block parameter? If so, is there any way to get what I want as described below? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L > wrote: Hi Sven, OK ? but why? I mean, that?s not what the man page says. Where does that ?4 x? come from? And, most importantly ? that?s not what I want. I want a smaller block size for the system pool since it?s metadata only and on RAID 1 mirrors (HD?s on the test cluster but SSD?s on the production cluster). So ? side question ? is 1 MB OK there? But I want a 4 MB block size for data with an 8 KB sub block ? I want good performance for the sane people using our cluster without unduly punishing the ? ahem ? fine folks whose apps want to create a bazillion tiny files! So how do I do that? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:41 PM, Sven Oehme > wrote: the number of subblocks is derived by the smallest blocksize in any pool of a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in the metadata pool, but 4 x of that in the data pool if your data pool is 4M. sven On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop > wrote: Marc, Kevin, We'll be looking into this issue, since at least at a first glance, it does look odd. A 4MB block size should have resulted in an 8KB subblock size. I suspect that, somehow, the --metadata-block-size 1M may have resulted in 32768 Minimum fragment (subblock) size in bytes (other pools) but I do not yet understand how. The subblocks-per-full-block parameter is not supported with mmcrfs . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 "Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per- From: "Marc A Kaplan" > To: gpfsug main discussion list > Date: 08/01/2018 01:21 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Wed Aug 1 20:10:50 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 1 Aug 2018 19:10:50 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Message-ID: Kevin asks: >>>> Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB ? so does this go back to what Marc is saying that there?s really only one sub blocks per block parameter? If so, is there any way to get what I want as described below? <<< Yep. Basically what's happening is: When you ask for a certain block size, Scale infers the subblock size as shown in the table. As Sven said, here you are asking for 1M blocks for metadata, so you get 8KiB subblocks. So far so good. These two numbers together determine the number of subblocks per block parameter, which as Marc said is shared across all the pools. So in order for your 4M data blocks to have the same number of subblocks per block as your 1M metadata blocks, the subblocks have to be 4 times as big. Something similar would happen with *any* choice of data block size above 1M, of course. The smallest size wins, and the 8KiB number is coming from the 1M, not the 4M. (Thanks, Sven). regards, Carl Zetie Offering Manager for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 19:47:47 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 18:47:47 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: Hi Sven, OK ? but why? I mean, that?s not what the man page says. Where does that ?4 x? come from? And, most importantly ? that?s not what I want. I want a smaller block size for the system pool since it?s metadata only and on RAID 1 mirrors (HD?s on the test cluster but SSD?s on the production cluster). So ? side question ? is 1 MB OK there? But I want a 4 MB block size for data with an 8 KB sub block ? I want good performance for the sane people using our cluster without unduly punishing the ? ahem ? fine folks whose apps want to create a bazillion tiny files! So how do I do that? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:41 PM, Sven Oehme > wrote: the number of subblocks is derived by the smallest blocksize in any pool of a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in the metadata pool, but 4 x of that in the data pool if your data pool is 4M. sven On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop > wrote: Marc, Kevin, We'll be looking into this issue, since at least at a first glance, it does look odd. A 4MB block size should have resulted in an 8KB subblock size. I suspect that, somehow, the --metadata-block-size 1M may have resulted in 32768 Minimum fragment (subblock) size in bytes (other pools) but I do not yet understand how. The subblocks-per-full-block parameter is not supported with mmcrfs . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 "Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per- From: "Marc A Kaplan" > To: gpfsug main discussion list > Date: 08/01/2018 01:21 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Aug 1 22:01:28 2018 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 1 Aug 2018 14:01:28 -0700 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> Message-ID: the only way to get max number of subblocks for a 5.0.x filesystem with the released code is to have metadata and data use the same blocksize. sven On Wed, Aug 1, 2018 at 11:52 AM Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu> wrote: > All, > > Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB ? so does > this go back to what Marc is saying that there?s really only one sub blocks > per block parameter? If so, is there any way to get what I want as > described below? > > Thanks? > > Kevin > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> > > > On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L < > Kevin.Buterbaugh at Vanderbilt.Edu> wrote: > > Hi Sven, > > OK ? but why? I mean, that?s not what the man page says. Where does that > ?4 x? come from? > > And, most importantly ? that?s not what I want. I want a smaller block > size for the system pool since it?s metadata only and on RAID 1 mirrors > (HD?s on the test cluster but SSD?s on the production cluster). So ? side > question ? is 1 MB OK there? > > But I want a 4 MB block size for data with an 8 KB sub block ? I want good > performance for the sane people using our cluster without unduly punishing > the ? ahem ? fine folks whose apps want to create a bazillion tiny files! > > So how do I do that? > > Thanks! > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> > > > On Aug 1, 2018, at 1:41 PM, Sven Oehme wrote: > > the number of subblocks is derived by the smallest blocksize in any pool > of a given filesystem. so if you pick a metadata blocksize of 1M it will be > 8k in the metadata pool, but 4 x of that in the data pool if your data pool > is 4M. > > sven > > On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop wrote: > > Marc, Kevin, >> >> We'll be looking into this issue, since at least at a first glance, it >> does look odd. A 4MB block size should have resulted in an 8KB subblock >> size. I suspect that, somehow, the *--metadata-block-size** 1M* may have >> resulted in >> >> >> 32768 Minimum fragment (subblock) size in bytes (other pools) >> >> but I do not yet understand how. >> >> The *subblocks-per-full-block* parameter is not supported with *mmcrfs *. >> >> Felipe >> >> ---- >> Felipe Knop knop at us.ibm.com >> GPFS Development and Security >> IBM Systems >> IBM Building 008 >> 2455 South Rd, Poughkeepsie, NY 12601 >> (845) 433-9314 T/L 293-9314 >> >> >> >> "Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't >> looked into all the details but here's a clue -- notice there is only one >> "subblocks-per- >> >> From: "Marc A Kaplan" >> >> >> To: gpfsug main discussion list >> >> Date: 08/01/2018 01:21 PM >> Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? >> >> >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> ------------------------------ >> >> >> >> I haven't looked into all the details but here's a clue -- notice there >> is only one "subblocks-per-full-block" parameter. >> >> And it is the same for both metadata blocks and datadata blocks. >> >> So maybe (MAYBE) that is a constraint somewhere... >> >> Certainly, in the currently supported code, that's what you get. >> >> >> >> >> From: "Buterbaugh, Kevin L" >> To: gpfsug main discussion list >> Date: 08/01/2018 12:55 PM >> Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> ------------------------------ >> >> >> >> Hi All, >> >> Our production cluster is still on GPFS 4.2.3.x, but in preparation for >> moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS >> 5.0.1-1. I am setting up a new filesystem there using hardware that we >> recently life-cycled out of our production environment. >> >> I ?successfully? created a filesystem but I believe the sub-block size is >> wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs >> man page the sub-block size should be 8K: >> >> Table 1. Block sizes and subblock sizes >> >> +???????????????????????????????+???????????????????????????????+ >> | Block size | Subblock size | >> +???????????????????????????????+???????????????????????????????+ >> | 64 KiB | 2 KiB | >> +???????????????????????????????+???????????????????????????????+ >> | 128 KiB | 4 KiB | >> +???????????????????????????????+???????????????????????????????+ >> | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | >> | MiB, 4 MiB | | >> +???????????????????????????????+???????????????????????????????+ >> | 8 MiB, 16 MiB | 16 KiB | >> +???????????????????????????????+???????????????????????????????+ >> >> However, it appears that it?s 8K for the system pool but 32K for the >> other pools: >> >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 8192 Minimum fragment (subblock) size in bytes (system pool) >> 32768 Minimum fragment (subblock) size in bytes (other pools) >> -i 4096 Inode size in bytes >> -I 32768 Indirect block size in bytes >> -m 2 Default number of metadata replicas >> -M 3 Maximum number of metadata replicas >> -r 1 Default number of data replicas >> -R 3 Maximum number of data replicas >> -j scatter Block allocation type >> -D nfs4 File locking semantics in effect >> -k all ACL semantics in effect >> -n 32 Estimated number of nodes that will mount file system >> -B 1048576 Block size (system pool) >> 4194304 Block size (other pools) >> -Q user;group;fileset Quotas accounting enabled >> user;group;fileset Quotas enforced >> none Default quotas enabled >> --perfileset-quota No Per-fileset quota enforcement >> --filesetdf No Fileset df enabled? >> -V 19.01 (5.0.1.0) File system version >> --create-time Wed Aug 1 11:39:39 2018 File system creation time >> -z No Is DMAPI enabled? >> -L 33554432 Logfile size >> -E Yes Exact mtime mount option >> -S relatime Suppress atime mount option >> -K whenpossible Strict replica allocation option >> --fastea Yes Fast external attributes enabled? >> --encryption No Encryption enabled? >> --inode-limit 101095424 Maximum number of inodes >> --log-replicas 0 Number of log replicas >> --is4KAligned Yes is4KAligned? >> --rapid-repair Yes rapidRepair enabled? >> --write-cache-threshold 0 HAWC Threshold (max 65536) >> --subblocks-per-full-block 128 Number of subblocks per full block >> -P system;raid1;raid6 Disk storage pools in file system >> --file-audit-log No File Audit Logging enabled? >> --maintenance-mode No Maintenance Mode enabled? >> -d >> test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd >> Disks in file system >> -A yes Automatic mount option >> -o none Additional mount options >> -T /gpfs5 Default mount point >> --mount-priority 0 Mount priority >> >> Output of mmcrfs: >> >> mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j >> scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 >> -v yes --nofilesetdf --metadata-block-size 1M >> >> The following disks of gpfs5 will be formatted on node testnsd3: >> test21A3nsd: size 953609 MB >> test21A4nsd: size 953609 MB >> test21B3nsd: size 953609 MB >> test21B4nsd: size 953609 MB >> test23Ansd: size 15259744 MB >> test23Bnsd: size 15259744 MB >> test23Cnsd: size 1907468 MB >> test24Ansd: size 15259744 MB >> test24Bnsd: size 15259744 MB >> test24Cnsd: size 1907468 MB >> test25Ansd: size 15259744 MB >> test25Bnsd: size 15259744 MB >> test25Cnsd: size 1907468 MB >> Formatting file system ... >> Disks up to size 8.29 TB can be added to storage pool system. >> Disks up to size 16.60 TB can be added to storage pool raid1. >> Disks up to size 132.62 TB can be added to storage pool raid6. >> Creating Inode File >> 8 % complete on Wed Aug 1 11:39:19 2018 >> 18 % complete on Wed Aug 1 11:39:24 2018 >> 27 % complete on Wed Aug 1 11:39:29 2018 >> 37 % complete on Wed Aug 1 11:39:34 2018 >> 48 % complete on Wed Aug 1 11:39:39 2018 >> 60 % complete on Wed Aug 1 11:39:44 2018 >> 72 % complete on Wed Aug 1 11:39:49 2018 >> 83 % complete on Wed Aug 1 11:39:54 2018 >> 95 % complete on Wed Aug 1 11:39:59 2018 >> 100 % complete on Wed Aug 1 11:40:01 2018 >> Creating Allocation Maps >> Creating Log Files >> 3 % complete on Wed Aug 1 11:40:07 2018 >> 28 % complete on Wed Aug 1 11:40:14 2018 >> 53 % complete on Wed Aug 1 11:40:19 2018 >> 78 % complete on Wed Aug 1 11:40:24 2018 >> 100 % complete on Wed Aug 1 11:40:25 2018 >> Clearing Inode Allocation Map >> Clearing Block Allocation Map >> Formatting Allocation Map for storage pool system >> 85 % complete on Wed Aug 1 11:40:32 2018 >> 100 % complete on Wed Aug 1 11:40:33 2018 >> Formatting Allocation Map for storage pool raid1 >> 53 % complete on Wed Aug 1 11:40:38 2018 >> 100 % complete on Wed Aug 1 11:40:42 2018 >> Formatting Allocation Map for storage pool raid6 >> 20 % complete on Wed Aug 1 11:40:47 2018 >> 39 % complete on Wed Aug 1 11:40:52 2018 >> 60 % complete on Wed Aug 1 11:40:57 2018 >> 79 % complete on Wed Aug 1 11:41:02 2018 >> 100 % complete on Wed Aug 1 11:41:08 2018 >> Completed creation of file system /dev/gpfs5. >> mmcrfs: Propagating the cluster configuration data to all >> affected nodes. This is an asynchronous process. >> >> And contents of stanza file: >> >> %nsd: >> nsd=test21A3nsd >> usage=metadataOnly >> failureGroup=210 >> pool=system >> servers=testnsd3,testnsd1,testnsd2 >> device=dm-15 >> >> %nsd: >> nsd=test21A4nsd >> usage=metadataOnly >> failureGroup=210 >> pool=system >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-14 >> >> %nsd: >> nsd=test21B3nsd >> usage=metadataOnly >> failureGroup=211 >> pool=system >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-17 >> >> %nsd: >> nsd=test21B4nsd >> usage=metadataOnly >> failureGroup=211 >> pool=system >> servers=testnsd2,testnsd3,testnsd1 >> device=dm-16 >> >> %nsd: >> nsd=test23Ansd >> usage=dataOnly >> failureGroup=23 >> pool=raid6 >> servers=testnsd2,testnsd3,testnsd1 >> device=dm-10 >> >> %nsd: >> nsd=test23Bnsd >> usage=dataOnly >> failureGroup=23 >> pool=raid6 >> servers=testnsd3,testnsd1,testnsd2 >> device=dm-9 >> >> %nsd: >> nsd=test23Cnsd >> usage=dataOnly >> failureGroup=23 >> pool=raid1 >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-5 >> >> %nsd: >> nsd=test24Ansd >> usage=dataOnly >> failureGroup=24 >> pool=raid6 >> servers=testnsd3,testnsd1,testnsd2 >> device=dm-6 >> >> %nsd: >> nsd=test24Bnsd >> usage=dataOnly >> failureGroup=24 >> pool=raid6 >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-0 >> >> %nsd: >> nsd=test24Cnsd >> usage=dataOnly >> failureGroup=24 >> pool=raid1 >> servers=testnsd2,testnsd3,testnsd1 >> device=dm-2 >> >> %nsd: >> nsd=test25Ansd >> usage=dataOnly >> failureGroup=25 >> pool=raid6 >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-6 >> >> %nsd: >> nsd=test25Bnsd >> usage=dataOnly >> failureGroup=25 >> pool=raid6 >> servers=testnsd2,testnsd3,testnsd1 >> device=dm-6 >> >> %nsd: >> nsd=test25Cnsd >> usage=dataOnly >> failureGroup=25 >> pool=raid1 >> servers=testnsd3,testnsd1,testnsd2 >> device=dm-3 >> >> %pool: >> pool=system >> blockSize=1M >> usage=metadataOnly >> layoutMap=scatter >> allowWriteAffinity=no >> >> %pool: >> pool=raid6 >> blockSize=4M >> usage=dataOnly >> layoutMap=scatter >> allowWriteAffinity=no >> >> %pool: >> pool=raid1 >> blockSize=4M >> usage=dataOnly >> layoutMap=scatter >> allowWriteAffinity=no >> >> What am I missing or what have I done wrong? Thanks? >> >> Kevin >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and >> Education >> *Kevin.Buterbaugh at vanderbilt.edu* - >> (615)875-9633 <(615)%20875-9633> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&reserved=0 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 22:58:26 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 21:58:26 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> Message-ID: <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Hi Sven (and Stephen and everyone else), I know there are certainly things you know but can?t talk about, but I suspect that I am not the only one to wonder about the possible significance of ?with the released code? in your response below?!? I understand the technical point you?re making and maybe the solution for me is to just use a 4 MB block size for my metadata only system pool? As Stephen Ulmer said in his response ? ("Why the desire for a 1MB block size for metadata? It is RAID1 so no re-write penalty or need to hit a stripe size. Are you just trying to save the memory? If you had a 4MB block size, an 8KB sub-block size and things were 4K-aligned, you would always read 2 4K inodes,?) ? so if I?m using RAID 1 with 4K inodes then am I gaining anything by going with a smaller block size for metadata? So why was I choosing 1 MB in the first place? Well, I was planning on doing some experimenting with different block sizes for metadata to see if it made any difference. Historically, we had used a metadata block size of 64K to match the hardware ?stripe? size on the storage arrays (RAID 1 mirrors of hard drives back in the day). Now our metadata is on SSDs so with our latest filesystem we used 1 MB for both data and metadata because of the 1/32nd sub-block thing in GPFS 4.x. Since GPFS 5 removes that restriction, I was going to do some experimenting, but if the correct answer is just ?if 4 MB is what?s best for your data, then use it for metadata too? then I don?t mind saving some time?. ;-) Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 4:01 PM, Sven Oehme > wrote: the only way to get max number of subblocks for a 5.0.x filesystem with the released code is to have metadata and data use the same blocksize. sven On Wed, Aug 1, 2018 at 11:52 AM Buterbaugh, Kevin L > wrote: All, Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB ? so does this go back to what Marc is saying that there?s really only one sub blocks per block parameter? If so, is there any way to get what I want as described below? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L > wrote: Hi Sven, OK ? but why? I mean, that?s not what the man page says. Where does that ?4 x? come from? And, most importantly ? that?s not what I want. I want a smaller block size for the system pool since it?s metadata only and on RAID 1 mirrors (HD?s on the test cluster but SSD?s on the production cluster). So ? side question ? is 1 MB OK there? But I want a 4 MB block size for data with an 8 KB sub block ? I want good performance for the sane people using our cluster without unduly punishing the ? ahem ? fine folks whose apps want to create a bazillion tiny files! So how do I do that? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:41 PM, Sven Oehme > wrote: the number of subblocks is derived by the smallest blocksize in any pool of a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in the metadata pool, but 4 x of that in the data pool if your data pool is 4M. sven On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop > wrote: Marc, Kevin, We'll be looking into this issue, since at least at a first glance, it does look odd. A 4MB block size should have resulted in an 8KB subblock size. I suspect that, somehow, the --metadata-block-size 1M may have resulted in 32768 Minimum fragment (subblock) size in bytes (other pools) but I do not yet understand how. The subblocks-per-full-block parameter is not supported with mmcrfs . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 "Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per- From: "Marc A Kaplan" > To: gpfsug main discussion list > Date: 08/01/2018 01:21 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C23d636037b234fbbf9e908d5f7f1fcd1%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687541066564165&sdata=Z1tfD%2BMI1piJAtaBXQ2y9MEGNNLqCyKgHHws2wHmiTo%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Aug 2 01:00:47 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 1 Aug 2018 20:00:47 -0400 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Aug 2 01:11:51 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 2 Aug 2018 00:11:51 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: , <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Aug 2 01:52:19 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 1 Aug 2018 20:52:19 -0400 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: <59D32F54-3A88-469D-9D44-CE12B675E95A@ulmer.org> > On Aug 1, 2018, at 8:11 PM, Andrew Beattie wrote: > [?] > > which is probably why 32k sub block was the default for so many years .... I may not be remembering correctly, but I thought the default block size was 256k, and the sub-block size was always fixed at 1/32nd of the block size ? which only yields 32k sub-blocks for a 1MB block size. I also think there used to be something special about a 16k block size? but I haven?t slept well in about a week, so I might just be losing it. -- Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Aug 2 02:10:10 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 2 Aug 2018 01:10:10 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: <59D32F54-3A88-469D-9D44-CE12B675E95A@ulmer.org> References: <59D32F54-3A88-469D-9D44-CE12B675E95A@ulmer.org>, <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Aug 2 09:44:02 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 2 Aug 2018 16:44:02 +0800 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: <59D32F54-3A88-469D-9D44-CE12B675E95A@ulmer.org><3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: In released GPFS, we only support one subblocks-per-fullblock in one file system, like Sven mentioned that the subblocks-per-fullblock is derived by the smallest block size of metadata and data pools, the smallest block size decides the subblocks-per-fullblock and subblock size of all pools. There's an enhancement plan to have pools with different block sizes and/or subblocks-per-fullblock. Thanks, Yuan, Zheng Cai From: "Andrew Beattie" To: gpfsug-discuss at spectrumscale.org Date: 2018/08/02 09:10 Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Stephen, Sorry your right, I had to go back and look up what we were doing for metadata. but we ended up with 1MB block for metadata and 8MB for data and a 32k subblock based on the 1MB metadata block size, effectively a 256k subblock for the Data Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Stephen Ulmer Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Date: Thu, Aug 2, 2018 11:00 AM On Aug 1, 2018, at 8:11 PM, Andrew Beattie wrote: [?] which is probably why 32k sub block was the default for so many years .... I may not be remembering correctly, but I thought the default block size was 256k, and the sub-block size was always fixed at 1/32nd of the block size ? which only yields 32k sub-blocks for a 1MB block size. I also think there used to be something special about a 16k block size? but I haven?t slept well in about a week, so I might just be losing it. -- Stephen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Aug 2 16:56:20 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 2 Aug 2018 11:56:20 -0400 Subject: [gpfsug-discuss] Sven Oehme now at DDN In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: https://www.linkedin.com/in/oehmes/ Apparently, Sven is now "Chief Research Officer at DDN" -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Aug 2 17:01:58 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 2 Aug 2018 16:01:58 +0000 Subject: [gpfsug-discuss] Sven Oehme now at DDN Message-ID: <4D2B1925-2C14-47F8-A1A5-8E4EBA211462@nuance.com> Yes, I heard about this last week - Best of luck and congratulations Sven! I?m sure he?ll be around many of the GPFS events on the future. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Thursday, August 2, 2018 at 10:56 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Sven Oehme now at DDN https://www.linkedin.com/in/oehmes/ Apparently, Sven is now "Chief Research Officer at DDN" -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Aug 2 21:31:39 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 2 Aug 2018 20:31:39 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: <1772373B-B371-46AF-A61F-1B310B6BC1A7@vanderbilt.edu> Hi All, Thanks for all the responses on this, although I have the sneaking suspicion that the most significant thing that is going to come out of this thread is the knowledge that Sven has left IBM for DDN. ;-) or :-( or :-O depending on your perspective. Anyway ? we have done some testing which has shown that a 4 MB block size is best for those workloads that use ?normal? sized files. However, we - like many similar institutions - support a mixed workload, so the 128K fragment size that comes with that is not optimal for the primarily biomedical type applications that literally create millions of very small files. That?s why we settled on 1 MB as a compromise. So we?re very eager to now test with GPFS 5, a 4 MB block size, and a 8K fragment size. I?m recreating my test cluster filesystem now with that config ? so 4 MB block size on the metadata only system pool, too. Thanks to all who took the time to respond to this thread. I hope it?s been beneficial to others as well? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 7:11 PM, Andrew Beattie > wrote: I too would second the comment about doing testing specific to your environment We recently deployed a number of ESS building blocks into a customer site that was specifically being used for a mixed HPC workload. We spent more than a week playing with different block sizes for both data and metadata trying to identify which variation would provide the best mix of both metadata performance and data performance. one thing we noticed very early on is that MDtest and IOR both respond very differently as you play with both block size and subblock size. What works for one use case may be a very poor option for another use case. Interestingly enough it turned out that the best overall option for our particular use case was an 8MB block size with 32k sub blocks -- as that gave us good Metadata performance and good sequential data performance which is probably why 32k sub block was the default for so many years .... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Marc A Kaplan" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Date: Thu, Aug 2, 2018 10:01 AM Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Aug 2 22:14:51 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 2 Aug 2018 21:14:51 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: <1772373B-B371-46AF-A61F-1B310B6BC1A7@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> <1772373B-B371-46AF-A61F-1B310B6BC1A7@vanderbilt.edu> Message-ID: OK, so hold on ? NOW what?s going on??? I deleted the filesystem ? went to lunch ? came back an hour later ? recreated the filesystem with a metadata block size of 4 MB ? and I STILL have a 1 MB block size in the system pool and the wrong fragment size in other pools? Kevin /root/gpfs root at testnsd1# mmdelfs gpfs5 All data on the following disks of gpfs5 will be destroyed: test21A3nsd test21A4nsd test21B3nsd test21B4nsd test23Ansd test23Bnsd test23Cnsd test24Ansd test24Bnsd test24Cnsd test25Ansd test25Bnsd test25Cnsd Completed deletion of file system /dev/gpfs5. mmdelfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root at testnsd1# mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 4M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 12 % complete on Thu Aug 2 13:16:26 2018 25 % complete on Thu Aug 2 13:16:31 2018 38 % complete on Thu Aug 2 13:16:36 2018 50 % complete on Thu Aug 2 13:16:41 2018 62 % complete on Thu Aug 2 13:16:46 2018 74 % complete on Thu Aug 2 13:16:52 2018 85 % complete on Thu Aug 2 13:16:57 2018 96 % complete on Thu Aug 2 13:17:02 2018 100 % complete on Thu Aug 2 13:17:03 2018 Creating Allocation Maps Creating Log Files 3 % complete on Thu Aug 2 13:17:09 2018 28 % complete on Thu Aug 2 13:17:15 2018 53 % complete on Thu Aug 2 13:17:20 2018 78 % complete on Thu Aug 2 13:17:26 2018 100 % complete on Thu Aug 2 13:17:27 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 98 % complete on Thu Aug 2 13:17:34 2018 100 % complete on Thu Aug 2 13:17:34 2018 Formatting Allocation Map for storage pool raid1 52 % complete on Thu Aug 2 13:17:39 2018 100 % complete on Thu Aug 2 13:17:43 2018 Formatting Allocation Map for storage pool raid6 24 % complete on Thu Aug 2 13:17:48 2018 50 % complete on Thu Aug 2 13:17:53 2018 74 % complete on Thu Aug 2 13:17:58 2018 99 % complete on Thu Aug 2 13:18:03 2018 100 % complete on Thu Aug 2 13:18:03 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root at testnsd1# mmlsfs gpfs5 flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Thu Aug 2 13:16:47 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority /root/gpfs root at testnsd1# ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 2, 2018, at 3:31 PM, Buterbaugh, Kevin L > wrote: Hi All, Thanks for all the responses on this, although I have the sneaking suspicion that the most significant thing that is going to come out of this thread is the knowledge that Sven has left IBM for DDN. ;-) or :-( or :-O depending on your perspective. Anyway ? we have done some testing which has shown that a 4 MB block size is best for those workloads that use ?normal? sized files. However, we - like many similar institutions - support a mixed workload, so the 128K fragment size that comes with that is not optimal for the primarily biomedical type applications that literally create millions of very small files. That?s why we settled on 1 MB as a compromise. So we?re very eager to now test with GPFS 5, a 4 MB block size, and a 8K fragment size. I?m recreating my test cluster filesystem now with that config ? so 4 MB block size on the metadata only system pool, too. Thanks to all who took the time to respond to this thread. I hope it?s been beneficial to others as well? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 7:11 PM, Andrew Beattie > wrote: I too would second the comment about doing testing specific to your environment We recently deployed a number of ESS building blocks into a customer site that was specifically being used for a mixed HPC workload. We spent more than a week playing with different block sizes for both data and metadata trying to identify which variation would provide the best mix of both metadata performance and data performance. one thing we noticed very early on is that MDtest and IOR both respond very differently as you play with both block size and subblock size. What works for one use case may be a very poor option for another use case. Interestingly enough it turned out that the best overall option for our particular use case was an 8MB block size with 32k sub blocks -- as that gave us good Metadata performance and good sequential data performance which is probably why 32k sub block was the default for so many years .... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Marc A Kaplan" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Date: Thu, Aug 2, 2018 10:01 AM Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=d1rBsXZEn1BlkmvHGKHvkk2%2FWmXAppqS5SbOQF0ZCrY%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Fri Aug 3 07:01:42 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 3 Aug 2018 06:01:42 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: Message-ID: Can u share your stanza file ? Von meinem iPhone gesendet > Am 02.08.2018 um 23:15 schrieb Buterbaugh, Kevin L : > > OK, so hold on ? NOW what?s going on??? I deleted the filesystem ? went to lunch ? came back an hour later ? recreated the filesystem with a metadata block size of 4 MB ? and I STILL have a 1 MB block size in the system pool and the wrong fragment size in other pools? > > Kevin > > /root/gpfs > root at testnsd1# mmdelfs gpfs5 > All data on the following disks of gpfs5 will be destroyed: > test21A3nsd > test21A4nsd > test21B3nsd > test21B4nsd > test23Ansd > test23Bnsd > test23Cnsd > test24Ansd > test24Bnsd > test24Cnsd > test25Ansd > test25Bnsd > test25Cnsd > Completed deletion of file system /dev/gpfs5. > mmdelfs: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > /root/gpfs > root at testnsd1# mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 4M > > The following disks of gpfs5 will be formatted on node testnsd3: > test21A3nsd: size 953609 MB > test21A4nsd: size 953609 MB > test21B3nsd: size 953609 MB > test21B4nsd: size 953609 MB > test23Ansd: size 15259744 MB > test23Bnsd: size 15259744 MB > test23Cnsd: size 1907468 MB > test24Ansd: size 15259744 MB > test24Bnsd: size 15259744 MB > test24Cnsd: size 1907468 MB > test25Ansd: size 15259744 MB > test25Bnsd: size 15259744 MB > test25Cnsd: size 1907468 MB > Formatting file system ... > Disks up to size 8.29 TB can be added to storage pool system. > Disks up to size 16.60 TB can be added to storage pool raid1. > Disks up to size 132.62 TB can be added to storage pool raid6. > Creating Inode File > 12 % complete on Thu Aug 2 13:16:26 2018 > 25 % complete on Thu Aug 2 13:16:31 2018 > 38 % complete on Thu Aug 2 13:16:36 2018 > 50 % complete on Thu Aug 2 13:16:41 2018 > 62 % complete on Thu Aug 2 13:16:46 2018 > 74 % complete on Thu Aug 2 13:16:52 2018 > 85 % complete on Thu Aug 2 13:16:57 2018 > 96 % complete on Thu Aug 2 13:17:02 2018 > 100 % complete on Thu Aug 2 13:17:03 2018 > Creating Allocation Maps > Creating Log Files > 3 % complete on Thu Aug 2 13:17:09 2018 > 28 % complete on Thu Aug 2 13:17:15 2018 > 53 % complete on Thu Aug 2 13:17:20 2018 > 78 % complete on Thu Aug 2 13:17:26 2018 > 100 % complete on Thu Aug 2 13:17:27 2018 > Clearing Inode Allocation Map > Clearing Block Allocation Map > Formatting Allocation Map for storage pool system > 98 % complete on Thu Aug 2 13:17:34 2018 > 100 % complete on Thu Aug 2 13:17:34 2018 > Formatting Allocation Map for storage pool raid1 > 52 % complete on Thu Aug 2 13:17:39 2018 > 100 % complete on Thu Aug 2 13:17:43 2018 > Formatting Allocation Map for storage pool raid6 > 24 % complete on Thu Aug 2 13:17:48 2018 > 50 % complete on Thu Aug 2 13:17:53 2018 > 74 % complete on Thu Aug 2 13:17:58 2018 > 99 % complete on Thu Aug 2 13:18:03 2018 > 100 % complete on Thu Aug 2 13:18:03 2018 > Completed creation of file system /dev/gpfs5. > mmcrfs: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > /root/gpfs > root at testnsd1# mmlsfs gpfs5 > flag value description > ------------------- ------------------------ ----------------------------------- > -f 8192 Minimum fragment (subblock) size in bytes (system pool) > 32768 Minimum fragment (subblock) size in bytes (other pools) > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 2 Default number of metadata replicas > -M 3 Maximum number of metadata replicas > -r 1 Default number of data replicas > -R 3 Maximum number of data replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in effect > -k all ACL semantics in effect > -n 32 Estimated number of nodes that will mount file system > -B 1048576 Block size (system pool) > 4194304 Block size (other pools) > -Q user;group;fileset Quotas accounting enabled > user;group;fileset Quotas enforced > none Default quotas enabled > --perfileset-quota No Per-fileset quota enforcement > --filesetdf No Fileset df enabled? > -V 19.01 (5.0.1.0) File system version > --create-time Thu Aug 2 13:16:47 2018 File system creation time > -z No Is DMAPI enabled? > -L 33554432 Logfile size > -E Yes Exact mtime mount option > -S relatime Suppress atime mount option > -K whenpossible Strict replica allocation option > --fastea Yes Fast external attributes enabled? > --encryption No Encryption enabled? > --inode-limit 101095424 Maximum number of inodes > --log-replicas 0 Number of log replicas > --is4KAligned Yes is4KAligned? > --rapid-repair Yes rapidRepair enabled? > --write-cache-threshold 0 HAWC Threshold (max 65536) > --subblocks-per-full-block 128 Number of subblocks per full block > -P system;raid1;raid6 Disk storage pools in file system > --file-audit-log No File Audit Logging enabled? > --maintenance-mode No Maintenance Mode enabled? > -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system > -A yes Automatic mount option > -o none Additional mount options > -T /gpfs5 Default mount point > --mount-priority 0 Mount priority > /root/gpfs > root at testnsd1# > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > >> On Aug 2, 2018, at 3:31 PM, Buterbaugh, Kevin L wrote: >> >> Hi All, >> >> Thanks for all the responses on this, although I have the sneaking suspicion that the most significant thing that is going to come out of this thread is the knowledge that Sven has left IBM for DDN. ;-) or :-( or :-O depending on your perspective. >> >> Anyway ? we have done some testing which has shown that a 4 MB block size is best for those workloads that use ?normal? sized files. However, we - like many similar institutions - support a mixed workload, so the 128K fragment size that comes with that is not optimal for the primarily biomedical type applications that literally create millions of very small files. That?s why we settled on 1 MB as a compromise. >> >> So we?re very eager to now test with GPFS 5, a 4 MB block size, and a 8K fragment size. I?m recreating my test cluster filesystem now with that config ? so 4 MB block size on the metadata only system pool, too. >> >> Thanks to all who took the time to respond to this thread. I hope it?s been beneficial to others as well? >> >> Kevin >> >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 >> >>> On Aug 1, 2018, at 7:11 PM, Andrew Beattie wrote: >>> >>> I too would second the comment about doing testing specific to your environment >>> >>> We recently deployed a number of ESS building blocks into a customer site that was specifically being used for a mixed HPC workload. >>> >>> We spent more than a week playing with different block sizes for both data and metadata trying to identify which variation would provide the best mix of both metadata performance and data performance. one thing we noticed very early on is that MDtest and IOR both respond very differently as you play with both block size and subblock size. What works for one use case may be a very poor option for another use case. >>> >>> Interestingly enough it turned out that the best overall option for our particular use case was an 8MB block size with 32k sub blocks -- as that gave us good Metadata performance and good sequential data performance >>> >>> which is probably why 32k sub block was the default for so many years .... >>> Andrew Beattie >>> Software Defined Storage - IT Specialist >>> Phone: 614-2133-7927 >>> E-mail: abeattie at au1.ibm.com >>> >>> >>> ----- Original message ----- >>> From: "Marc A Kaplan" >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> To: gpfsug main discussion list >>> Cc: >>> Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? >>> Date: Thu, Aug 2, 2018 10:01 AM >>> >>> Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. >>> >>> Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. >>> >>> Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". >>> >>> Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. >>> Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... >>> >>> OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0 >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=d1rBsXZEn1BlkmvHGKHvkk2%2FWmXAppqS5SbOQF0ZCrY%3D&reserved=0 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Fri Aug 3 07:53:31 2018 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Fri, 3 Aug 2018 08:53:31 +0200 Subject: [gpfsug-discuss] Sven, the man with the golden gun now at DDN Message-ID: FYI - Sven is on a TOP secret mission called "Skyfall"; with his spirit, super tech skills and know-how he will educate and convert all the poor Lustre souls which are fighting for the world leadership. The GPFS-Q-team in Poughkeepsie has prepared him a golden Walther PPK (9mm) with lot's of Scale v5. silver bullets. He was given a top secret make_all_kind_of_I/O faster debugger with auto tuning features. And off course he received a new car by Aston Martin with lot's of special features designed by POK. It has dual V20-cores, lots of RAM, a Mestor-transmission, twin-port RoCE turbochargers, AFM Rockets and LROC escape seats. Poughkeepsie is still in the process to hire a larger group of smart and good looking NMVeOF I/O girls; feel free to send your ideas and pictures. The list of selected "Sven Girls" with be published in a new section in the Scale FAQ. -frank- -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Aug 3 13:49:48 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 3 Aug 2018 12:49:48 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: Message-ID: <11A27CF3-7484-45A8-ACFB-82B1F772A99B@vanderbilt.edu> Hi All, Aargh - now I really do feel like an idiot! I had set up the stanza file over a week ago ? then had to work on production issues ? and completely forgot about setting the block size in the pool stanzas there. But at least we all now know that stanza files override command line arguments to mmcrfs. My apologies? Kevin On Aug 3, 2018, at 1:01 AM, Olaf Weiser > wrote: Can u share your stanza file ? Von meinem iPhone gesendet Am 02.08.2018 um 23:15 schrieb Buterbaugh, Kevin L >: OK, so hold on ? NOW what?s going on??? I deleted the filesystem ? went to lunch ? came back an hour later ? recreated the filesystem with a metadata block size of 4 MB ? and I STILL have a 1 MB block size in the system pool and the wrong fragment size in other pools? Kevin /root/gpfs root at testnsd1# mmdelfs gpfs5 All data on the following disks of gpfs5 will be destroyed: test21A3nsd test21A4nsd test21B3nsd test21B4nsd test23Ansd test23Bnsd test23Cnsd test24Ansd test24Bnsd test24Cnsd test25Ansd test25Bnsd test25Cnsd Completed deletion of file system /dev/gpfs5. mmdelfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root at testnsd1# mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 4M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 12 % complete on Thu Aug 2 13:16:26 2018 25 % complete on Thu Aug 2 13:16:31 2018 38 % complete on Thu Aug 2 13:16:36 2018 50 % complete on Thu Aug 2 13:16:41 2018 62 % complete on Thu Aug 2 13:16:46 2018 74 % complete on Thu Aug 2 13:16:52 2018 85 % complete on Thu Aug 2 13:16:57 2018 96 % complete on Thu Aug 2 13:17:02 2018 100 % complete on Thu Aug 2 13:17:03 2018 Creating Allocation Maps Creating Log Files 3 % complete on Thu Aug 2 13:17:09 2018 28 % complete on Thu Aug 2 13:17:15 2018 53 % complete on Thu Aug 2 13:17:20 2018 78 % complete on Thu Aug 2 13:17:26 2018 100 % complete on Thu Aug 2 13:17:27 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 98 % complete on Thu Aug 2 13:17:34 2018 100 % complete on Thu Aug 2 13:17:34 2018 Formatting Allocation Map for storage pool raid1 52 % complete on Thu Aug 2 13:17:39 2018 100 % complete on Thu Aug 2 13:17:43 2018 Formatting Allocation Map for storage pool raid6 24 % complete on Thu Aug 2 13:17:48 2018 50 % complete on Thu Aug 2 13:17:53 2018 74 % complete on Thu Aug 2 13:17:58 2018 99 % complete on Thu Aug 2 13:18:03 2018 100 % complete on Thu Aug 2 13:18:03 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root at testnsd1# mmlsfs gpfs5 flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Thu Aug 2 13:16:47 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority /root/gpfs root at testnsd1# ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 2, 2018, at 3:31 PM, Buterbaugh, Kevin L > wrote: Hi All, Thanks for all the responses on this, although I have the sneaking suspicion that the most significant thing that is going to come out of this thread is the knowledge that Sven has left IBM for DDN. ;-) or :-( or :-O depending on your perspective. Anyway ? we have done some testing which has shown that a 4 MB block size is best for those workloads that use ?normal? sized files. However, we - like many similar institutions - support a mixed workload, so the 128K fragment size that comes with that is not optimal for the primarily biomedical type applications that literally create millions of very small files. That?s why we settled on 1 MB as a compromise. So we?re very eager to now test with GPFS 5, a 4 MB block size, and a 8K fragment size. I?m recreating my test cluster filesystem now with that config ? so 4 MB block size on the metadata only system pool, too. Thanks to all who took the time to respond to this thread. I hope it?s been beneficial to others as well? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 7:11 PM, Andrew Beattie > wrote: I too would second the comment about doing testing specific to your environment We recently deployed a number of ESS building blocks into a customer site that was specifically being used for a mixed HPC workload. We spent more than a week playing with different block sizes for both data and metadata trying to identify which variation would provide the best mix of both metadata performance and data performance. one thing we noticed very early on is that MDtest and IOR both respond very differently as you play with both block size and subblock size. What works for one use case may be a very poor option for another use case. Interestingly enough it turned out that the best overall option for our particular use case was an 8MB block size with 32k sub blocks -- as that gave us good Metadata performance and good sequential data performance which is probably why 32k sub block was the default for so many years .... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Marc A Kaplan" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Date: Thu, Aug 2, 2018 10:01 AM Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=d1rBsXZEn1BlkmvHGKHvkk2%2FWmXAppqS5SbOQF0ZCrY%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C89b5017f862b465a9ee908d5f9069a29%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688729119843837&sdata=0vjRu2TsZ5%2Bf84Sb7%2BTEdi8%2BmLGGpbqq%2FXNg2zfJRiw%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Aug 3 20:37:50 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 3 Aug 2018 12:37:50 -0700 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 Message-ID: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> All, Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: ? the draft agenda (bottom of page), ? a link to registration, register by September 1 due to ORNL site requirements (see next line) ? an important note about registration requirements for going to Oak Ridge National Lab ? a request for your site presentations ? information about HPCXXL and who to contact for information about joining, and ? other upcoming events. Hope you can attend and see Summit and Alpine first hand. Best, Kristy Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. About HPCXXL: HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. Other upcoming GPFS/SS events: Sep 19+20 HPCXXL, Oak Ridge Aug 10 Meetup along TechU, Sydney Oct 24 NYC User Meeting, New York Nov 11 SC, Dallas Dec 12 CIUK, Manchester Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ Duration Start End Title Wednesday 19th, 2018 Speaker TBD Chris Maestas (IBM) TBD (IBM) TBD (IBM) John Lewars (IBM) *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) John Lewars (IBM) Carl Zetie (IBM) TBD TBD (ORNL) TBD (IBM) William Godoy (ORNL) Ted Hoover (IBM) Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 13:15 Welcome 13:45 What is new in Spectrum Scale? 14:00 What is new in ESS? 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === 15:40 AWE 16:00 CSCS site report 16:20 Starfish (Sponsor talk) 16:50 Network Flow 17:20 RFEs 17:30 W rap-up Thursday 19th, 2018 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 08:50 Alpine ? the Summit file system 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library 10:00 AI Reference Architecture 10:30 === BREAK === 11:00 Encryption on the wire and on rest 11:30 Service Update 12:00 Open Forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Aug 6 19:34:34 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 6 Aug 2018 18:34:34 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: Message-ID: <60B6991C-8021-470E-BD71-B4885C726957@vanderbilt.edu> Hi All, So I was just reading the GPFS 5.0.0 Administration Guide (yes, I actually do look at the documentation even if it seems sometimes that I don?t!) for some other information and happened to come across this at the bottom of page 358: The --metadata-block-size flag on the mmcrfs command can be used to create a system pool with a different block size from the user pools. This can be especially beneficial if the default block size is larger than 1 MB. If data and metadata block sizes differ, the system pool must contain only metadataOnly disks. Given that one of the responses I received during this e-mail thread was from an IBM engineer basically pointing out that there is no benefit in setting the metadata-block-size to less than 4 MB if that?s what I want for the filesystem block size, this might be a candidate for a documentation update. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hnguyen at cray.com Mon Aug 6 20:52:28 2018 From: hnguyen at cray.com (Hoang Nguyen) Date: Mon, 6 Aug 2018 19:52:28 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: <60B6991C-8021-470E-BD71-B4885C726957@vanderbilt.edu> References: <60B6991C-8021-470E-BD71-B4885C726957@vanderbilt.edu> Message-ID: <7A96225E-B939-411F-B4C4-458DD4470B4D@cray.com> That comment in the Administration guide is a legacy comment when Metadata sub-block size was restricted to 1/32 of the Metadata block size. In the past, creating large Metadata block sizes also meant large sub-blocks and hence large directory blocks which wasted a lot of space. From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Monday, August 6, 2018 at 11:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Hi All, So I was just reading the GPFS 5.0.0 Administration Guide (yes, I actually do look at the documentation even if it seems sometimes that I don?t!) for some other information and happened to come across this at the bottom of page 358: The --metadata-block-size flag on the mmcrfs command can be used to create a system pool with a different block size from the user pools. This can be especially beneficial if the default block size is larger than 1 MB. If data and metadata block sizes differ, the system pool must contain only metadataOnly disks. Given that one of the responses I received during this e-mail thread was from an IBM engineer basically pointing out that there is no benefit in setting the metadata-block-size to less than 4 MB if that?s what I want for the filesystem block size, this might be a candidate for a documentation update. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Aug 6 22:42:54 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 6 Aug 2018 21:42:54 +0000 Subject: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: <735F4275-191A-4363-B98C-1EA289292037@vanderbilt.edu> Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From esperle at us.ibm.com Mon Aug 6 23:46:39 2018 From: esperle at us.ibm.com (Eric Sperley) Date: Mon, 6 Aug 2018 15:46:39 -0700 Subject: [gpfsug-discuss] mmaddcallback documentation issue In-Reply-To: <735F4275-191A-4363-B98C-1EA289292037@vanderbilt.edu> References: <735F4275-191A-4363-B98C-1EA289292037@vanderbilt.edu> Message-ID: See if this helps https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_mmaddcallback.htm Best Regards, Eric Eric Sperley, PhD SDI Architect To improve is to change; to be perfect is IBM Systems to change often - - Winston Churchill esperle at us.ibm.com +15033088721 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/06/2018 02:44 PM Subject: [gpfsug-discuss] mmaddcallback documentation issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A910265.gif Type: image/gif Size: 481 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A526482.gif Type: image/gif Size: 2322 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From peter.chase at metoffice.gov.uk Tue Aug 7 12:35:17 2018 From: peter.chase at metoffice.gov.uk (Chase, Peter) Date: Tue, 7 Aug 2018 11:35:17 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue Message-ID: Hi Kevin, I'm running policy migrations on Spectrum Scale 4.2.3, but I use mmapplypolicy to kick off the policy runs, not mmstartpolicy. Docs here (which I admit are not for your version of Spectrum Scale) state that mmstartpolicy is for internal GPFS use only: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Using+Policies So if the above link is correct, I'd recommend switching to using mmapplypolicy, which handily comes with a man page, whereas mmstartpolicy doesn't and might have you fumbling around in the dark. As for the issue you're experiencing with adding a callback, it looks like the mmaddcallback command is catching the --single-instance flag as an argument for it, not as a parameter for the mmstartpolicy command. After looking at the documentation you've referenced, I suspect that there's a typo/omission in the command and it should have a trailing double quote (") on the end of the parms argument list, i.e.: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance" I'm not sure how we go about asking IBM to correct their documentation, but expect someone in the user group will have some idea. Regards, Pete Chase peter.chase at metoffice.gov.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: 06 August 2018 23:47 To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 79, Issue 21 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. mmaddcallback documentation issue (Buterbaugh, Kevin L) 2. Re: mmaddcallback documentation issue (Eric Sperley) ---------------------------------------------------------------------- Message: 1 Date: Mon, 6 Aug 2018 21:42:54 +0000 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Subject: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: <735F4275-191A-4363-B98C-1EA289292037 at vanderbilt.edu> Content-Type: text/plain; charset="utf-8" Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Mon, 6 Aug 2018 15:46:39 -0700 From: "Eric Sperley" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: Content-Type: text/plain; charset="utf-8" See if this helps https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_mmaddcallback.htm Best Regards, Eric Eric Sperley, PhD SDI Architect To improve is to change; to be perfect is IBM Systems to change often - - Winston Churchill esperle at us.ibm.com +15033088721 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/06/2018 02:44 PM Subject: [gpfsug-discuss] mmaddcallback documentation issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A910265.gif Type: image/gif Size: 481 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A526482.gif Type: image/gif Size: 2322 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 79, Issue 21 ********************************************** From UWEFALKE at de.ibm.com Tue Aug 7 13:30:48 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 7 Aug 2018 14:30:48 +0200 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue In-Reply-To: References: Message-ID: "I'm not sure how we go about asking IBM to correct their documentation,..." One way would be to open a PMR, er?, case. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From Kevin.Buterbaugh at Vanderbilt.Edu Tue Aug 7 17:14:27 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 7 Aug 2018 16:14:27 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue In-Reply-To: References: Message-ID: <3F1F205C-B3EB-44CF-BC47-84FDF335FBEF@vanderbilt.edu> Hi All, I was able to navigate down thru IBM?s website and find the GPFS 5.0.1 manuals but they contain the same typo, which Pete has correctly identified ? and I have confirmed that his solution works. Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 7, 2018, at 6:35 AM, Chase, Peter > wrote: Hi Kevin, I'm running policy migrations on Spectrum Scale 4.2.3, but I use mmapplypolicy to kick off the policy runs, not mmstartpolicy. Docs here (which I admit are not for your version of Spectrum Scale) state that mmstartpolicy is for internal GPFS use only: https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fwikis%2Fhome%3Flang%3Den%23!%2Fwiki%2FGeneral%2BParallel%2BFile%2BSystem%2B(GPFS)%2Fpage%2FUsing%2BPolicies&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912985631&sdata=4PmYIvmKenhqtLRVhusaQpWHAjGcd6YFMkb5nMa%2Bwuw%3D&reserved=0 So if the above link is correct, I'd recommend switching to using mmapplypolicy, which handily comes with a man page, whereas mmstartpolicy doesn't and might have you fumbling around in the dark. As for the issue you're experiencing with adding a callback, it looks like the mmaddcallback command is catching the --single-instance flag as an argument for it, not as a parameter for the mmstartpolicy command. After looking at the documentation you've referenced, I suspect that there's a typo/omission in the command and it should have a trailing double quote (") on the end of the parms argument list, i.e.: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance" I'm not sure how we go about asking IBM to correct their documentation, but expect someone in the user group will have some idea. Regards, Pete Chase peter.chase at metoffice.gov.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: 06 August 2018 23:47 To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 79, Issue 21 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3D&reserved=0 or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. mmaddcallback documentation issue (Buterbaugh, Kevin L) 2. Re: mmaddcallback documentation issue (Eric Sperley) ---------------------------------------------------------------------- Message: 1 Date: Mon, 6 Aug 2018 21:42:54 +0000 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Subject: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: <735F4275-191A-4363-B98C-1EA289292037 at vanderbilt.edu> Content-Type: text/plain; charset="utf-8" Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Mon, 6 Aug 2018 15:46:39 -0700 From: "Eric Sperley" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: Content-Type: text/plain; charset="utf-8" See if this helps https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.1%2Fcom.ibm.spectrum.scale.v5r01.doc%2Fbl1adm_mmaddcallback.htm&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=WGASrQ8SqzMdkTkNRkeAEDoaACsnDZEAJF8G5GBIxsA%3D&reserved=0 Best Regards, Eric Eric Sperley, PhD SDI Architect To improve is to change; to be perfect is IBM Systems to change often - - Winston Churchill esperle at us.ibm.com +15033088721 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/06/2018 02:44 PM Subject: [gpfsug-discuss] mmaddcallback documentation issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A910265.gif Type: image/gif Size: 481 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A526482.gif Type: image/gif Size: 2322 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3D&reserved=0 End of gpfsug-discuss Digest, Vol 79, Issue 21 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Tue Aug 7 17:58:45 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Tue, 7 Aug 2018 16:58:45 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue In-Reply-To: References: Message-ID: >I'm not sure how we go about asking IBM to correct their documentation, but expect someone in the user group will have some idea. File an RFE against Scale and I will route it to the right place. Carl Zetie Offering Manager for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From carlz at us.ibm.com Wed Aug 8 13:24:52 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 8 Aug 2018 12:24:52 +0000 Subject: [gpfsug-discuss] Easy way to submit Documentation corrections and enhancements Message-ID: It turns out that there is an easier, faster way to submit corrections and enhancements to the Scale documentation than sending me an RFE. At the bottom of each page in the Knowledge Center, there is a Comments section. You just need to be signed in under your IBM ID to add a comment. And all of the comments are read and processed by our information design team. regards, Carl Zetie Offering Manager for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From ulmer at ulmer.org Thu Aug 9 05:46:12 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 9 Aug 2018 00:46:12 -0400 Subject: [gpfsug-discuss] Sven Oehme now at DDN In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: <151B98C8-4CF7-42DF-A328-0DAABAE067D0@ulmer.org> But it still shows him employed at IBM through ?present?. So is he on-loan or is it ?permanent?? -- Stephen > On Aug 2, 2018, at 11:56 AM, Marc A Kaplan wrote: > > https://www.linkedin.com/in/oehmes/ > Apparently, Sven is now "Chief Research Officer at DDN" > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Aug 9 06:07:53 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 9 Aug 2018 07:07:53 +0200 Subject: [gpfsug-discuss] Sven Oehme now at DDN In-Reply-To: <151B98C8-4CF7-42DF-A328-0DAABAE067D0@ulmer.org> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> <151B98C8-4CF7-42DF-A328-0DAABAE067D0@ulmer.org> Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Aug 9 14:18:40 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 9 Aug 2018 09:18:40 -0400 Subject: [gpfsug-discuss] Sven Oehme now at DDN In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu><151B98C8-4CF7-42DF-A328-0DAABAE067D0@ulmer.org> Message-ID: https://en.wikipedia.org/wiki/Coopetition -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Aug 9 20:11:27 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 9 Aug 2018 15:11:27 -0400 Subject: [gpfsug-discuss] logAssertFailed question Message-ID: <35653ad6-1184-d880-e7d6-0c55c87232f6@nasa.gov> Howdy All, We recently had a node running 4.2.3.6 (efix 9billion, sorry can't remember the exact efix) go wonky with a logAssertFailed error that looked similar to the description of this APAR fixed in 4.2.3.8: - Fix an assert in BufferDesc::flushBuffer Assert exp(!addrDirty || synchedStale || allDirty inode 554192 block 10 addrDirty 1 synchedStale 0 allDirty 0 that can happen during shutdown IJ04520 The odd thing is that APAR mentions the error can happen at shutdown and this node wasn't shutting down. In this APAR, can the error also occur when the node is not shutting down? Here's the head of the error we saw: Thu Aug 9 11:06:53.977 2018: [X] logAssertFailed: !addrDirty || synchedStale || allDirty Thu Aug 9 11:06:53.978 2018: [X] return code 0, reason code 0, log record tag 0 Thu Aug 9 11:06:57.557 2018: [X] *** Assert exp(!addrDirty || synchedStale || allDirty inode 96666844 snap 0 block 2034 bdP 0x1802F51DE40 addrDirty 1 synchedStale 0 allDirty 0 validBits 3x0-000000000003FFFF dirtyBits 3x0-000000000003FFFF ) in line 7316 of file /build/ode/ttn423ptf6/src/avs/fs/mmfs/ts/fs/bufdesc.C Thu Aug 9 11:06:57.558 2018: [E] *** Traceback: Thu Aug 9 11:06:57.559 2018: [E] 2:0x555555D6A016 logAssertFailed + 0x1B6 at ??:0 Thu Aug 9 11:06:57.560 2018: [E] 3:0x55555594B333 BufferDesc::flushBuffer(int, long long*) + 0x14A3 at ??:0 Thu Aug 9 11:06:57.561 2018: [E] 4:0x555555B483CE GlobalFS::LookForCleanToDo() + 0x2DE at ??:0 Thu Aug 9 11:06:57.562 2018: [E] 5:0x555555B48524 BufferCleanerBody(void*) + 0x74 at ??:0 Thu Aug 9 11:06:57.563 2018: [E] 6:0x555555868556 Thread::callBody(Thread*) + 0x46 at ??:0 Thu Aug 9 11:06:57.564 2018: [E] 7:0x555555855AF2 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 Thu Aug 9 11:06:57.565 2018: [E] 8:0x7FFFF79C5806 start_thread + 0xE6 at ??:0 Thu Aug 9 11:06:57.566 2018: [E] 9:0x7FFFF6B8567D clone + 0x6D at ??:0 mmfsd: /build/ode/ttn423ptf6/src/avs/fs/mmfs/ts/fs/bufdesc.C:7316: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion `!addrDirty || synchedStale || allDirty inode 96666844 snap 0 block 2034 bdP 0x1802F51DE40 addrDirty 1 synchedStale 0 allDirty 0 validBits 3x0-000000000003FFFF dirtyBits 3x0-000000000003FFFF ' failed. Thu Aug 9 11:06:57.586 2018: [E] Signal 6 at location 0x7FFFF6AD9875 in process 10775, link reg 0xFFFFFFFFFFFFFFFF. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From valdis.kletnieks at vt.edu Thu Aug 9 20:25:47 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 09 Aug 2018 15:25:47 -0400 Subject: [gpfsug-discuss] logAssertFailed question In-Reply-To: <35653ad6-1184-d880-e7d6-0c55c87232f6@nasa.gov> References: <35653ad6-1184-d880-e7d6-0c55c87232f6@nasa.gov> Message-ID: <29489.1533842747@turing-police.cc.vt.edu> On Thu, 09 Aug 2018 15:11:27 -0400, Aaron Knister said: > We recently had a node running 4.2.3.6 (efix 9billion, sorry can't > remember the exact efix) go wonky with a logAssertFailed error that > looked similar to the description of this APAR fixed in 4.2.3.8: > > - Fix an assert in BufferDesc::flushBuffer Assert exp(!addrDirty || > synchedStale || allDirty inode 554192 block 10 addrDirty 1 synchedStale > 0 allDirty 0 that can happen during shutdown IJ04520 Yep. *that* one. Saw it often enough to put a serious crimp in our style. 'logAssertFailed: ! addrDirty || synchedStale || allDirty' It's *totally* possible to hit it in the middle of a production workload. I don't think we ever saw it during shutdown. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Stephan.Peinkofer at lrz.de Fri Aug 10 12:29:18 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 10 Aug 2018 11:29:18 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Message-ID: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Fri Aug 10 13:51:56 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 10 Aug 2018 14:51:56 +0200 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: <298030c14ce94fae8f21aefe9d736b84@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Aug 10 14:02:33 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 10 Aug 2018 09:02:33 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Message-ID: Questions: How/why was the decision made to use a large number (~1000) of independent filesets ? What functions/features/commands are being used that work with independent filesets, that do not also work with "dependent" filesets? -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.lischewski at fz-juelich.de Fri Aug 10 15:25:17 2018 From: m.lischewski at fz-juelich.de (Martin Lischewski) Date: Fri, 10 Aug 2018 16:25:17 +0200 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Message-ID: Hello Olaf, hello Marc, we in J?lich are in the middle of migrating/copying all our old filesystems which were created with filesystem version: 13.23 (3.5.0.7) to new filesystems created with GPFS 5.0.1. We move to new filesystems mainly for two reasons: 1. We want to use the new increased number of subblocks. 2. We have to change our quota from normal "group-quota per filesystem" to "fileset-quota". The idea is to create a separate fileset for each group/project. For the users the quota-computation should be much more transparent. From now on all data which is stored inside of their directory (fileset) counts for their quota independent of the ownership. Right now we have round about 900 groups which means we will create round about 900 filesets per filesystem. In one filesystem we will have about 400million inodes (with rising tendency). This filesystem we will back up with "mmbackup" so we talked with Dominic Mueller-Wicke and he recommended us to use independent filesets. Because then the policy-runs can be parallelized and we can increase the backup performance. We belive that we require these parallelized policies run to meet our backup performance targets. But there are even more features we enable by using independet filesets. E.g. "Fileset level snapshots" and "user and group quotas inside of a fileset". I did not know about performance issues regarding independent filesets... Can you give us some more information about this? All in all we are strongly supporting the idea of increasing this limit. Do I understand correctly that by opening a PMR IBM allows to increase this limit on special sides? I would rather like to increase the limit and make it official public available and supported. Regards, Martin Am 10.08.2018 um 14:51 schrieb Olaf Weiser: > Hallo Stephan, > the limit is not a hard coded limit ?- technically spoken, you can > raise it easily. > But as always, it is a question of test 'n support .. > > I've seen customer cases, where the use of much smaller amount of > independent filesets generates a lot performance issues, hangs ... at > least noise and partial trouble .. > it might be not the case with your specific workload, because due to > the fact, that you 're running already ?close to 1000 ... > > I suspect , this number of 1000 file sets ?- at the time of > introducing it - was as also just that one had to pick a number... > > ... turns out.. that a general commitment to support > 1000 > ind.fileset is more or less hard.. because what uses cases should we > test / support > I think , there might be a good chance for you , that for your > specific workload, one would allow and support more than 1000 > > do you still have a PMR for your side for this ? ?- if not - I know .. > open PMRs is an additional ...but could you please .. > then we can decide .. if raising the limit is an option for you .. > > > > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 > E-Mail: olaf.weiser at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, > Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke , Uwe Tron > , Dorian Krause > Date: 08/10/2018 01:29 PM > Subject: [gpfsug-discuss] GPFS Independent Fileset Limit > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Dear IBM and GPFS List, > > we at the Leibniz Supercomputing Centre and our GCS Partners from the > J?lich Supercomputing Centre will soon be hitting the current > Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. > > There are also a number of RFEs from other users open, that target > this limitation: > _https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780_ > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534_ > __https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530_ > _https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282_ > > I know GPFS Development was very busy fulfilling the CORAL > requirements but maybe now there is again some time to improve > something else. > > If there are any other users on the list that are approaching the > current limitation in independent filesets, please take some time and > vote for the RFEs above. > > Many thanks in advance and have a nice weekend. > Best Regards, > Stephan Peinkofer > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5118 bytes Desc: S/MIME Cryptographic Signature URL: From Stephan.Peinkofer at lrz.de Fri Aug 10 16:14:46 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 10 Aug 2018 15:14:46 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> , Message-ID: Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): HPC WORK: Here every project has - for the lifetime of the project - a dedicated storage area that has some fileset quota attached to it, but no further per user or per group quotas are applied here. No backup is taken. Data Science Storage: This is for long term online and collaborative storage. Here projects can get so called "DSS Containers" to which they can give arbitrary users access to via a Self Service Interface (a little bit like Dropbox). Each of this DSS Containers is implemented via a independent fileset so that projects can also specify a per user quota for invited users, we can backup each container efficiently into a different TSM Node via mmbackup and we can run different actions using the mmapplypolicy to a DSS Container. Also we plan to offer our users to enable snapshots on their containers if they wish so. We currently deploy a 2PB file system for this and are in the process of bringing up two additional 10PB file systems for this but already have requests what it would mean if we have to scale this to 50PB. Data Science Archive (Planned): This is for long term archive storage. The usage model will be something similar to DSS but underlying, we plan to use TSM/HSM. Another point, but I don't remember it completely from the top of my head, where people might hit the limit is when they are using your OpenStack Manila integration. As It think your Manila driver creates an independent fileset for each network share in order to be able to provide the per share snapshot feature. So if someone is trying to use ISS in a bigger OS Cloud as Manila Storage the 1000er limit might hit them also. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Friday, August 10, 2018 3:02 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Questions: How/why was the decision made to use a large number (~1000) of independent filesets ? What functions/features/commands are being used that work with independent filesets, that do not also work with "dependent" filesets? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Peinkofer at lrz.de Fri Aug 10 16:39:50 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 10 Aug 2018 15:39:50 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, Message-ID: Dear Olaf, I know that this is "just" a "support" limit. However Sven some day on a UG meeting in Ehningen told me that there is more to this than just adjusting your QA qualification tests since the way it is implemented today does not really scale ;). That's probably the reason why you said you see sometimes problems when you are not even close to the limit. So if you look at the 250PB Alpine file system of Summit today, that is what's going to deployed at more than one site world wide in 2-4 years and imho independent filesets are a great way to make this large systems much more handy while still maintaining a unified namespace. So I really think it would be beneficial if the architectural limit that prevents scaling the number of independent filesets could be removed at all. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Olaf Weiser Sent: Friday, August 10, 2018 2:51 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Hallo Stephan, the limit is not a hard coded limit - technically spoken, you can raise it easily. But as always, it is a question of test 'n support .. I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. it might be not the case with your specific workload, because due to the fact, that you 're running already close to 1000 ... I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a number... ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what uses cases should we test / support I think , there might be a good chance for you , that for your specific workload, one would allow and support more than 1000 do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you please .. then we can decide .. if raising the limit is an option for you .. Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Peinkofer, Stephan" To: gpfsug main discussion list Cc: Doris Franke , Uwe Tron , Dorian Krause Date: 08/10/2018 01:29 PM Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 Sign up for an IBM account www.ibm.com IBM account registration https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Aug 10 16:51:28 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Aug 2018 15:51:28 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, Message-ID: <25008ae9da1649bb969592fdc0a5d6b5@jumptrading.com> This is definitely a great candidate for a RFE, if one does not already exist. Not to try and contradict by friend Olaf here, but I have been talking a lot with those internal to IBM, and the PMR process is for finding and correcting operational problems with the code level you are running, and closing out the PMR as quickly as possible. PMRs are not the vehicle for getting substantive changes and enhancements made to the product in general, which the RFE process is really the main way to do this. I just got off a call with Kristie and Carl about the RFE process and those on the list may know that we are working to improve this overall process. More will be sent out about this in the near future!! So I thought I would chime in on this discussion here to hopefully help us understand how important the RFE (admittedly currently got great) process really is and will be a great way to work together on these common goals and needs for the product we rely so heavily upon! Cheers!! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peinkofer, Stephan Sent: Friday, August 10, 2018 10:40 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ Dear Olaf, I know that this is "just" a "support" limit. However Sven some day on a UG meeting in Ehningen told me that there is more to this than just adjusting your QA qualification tests since the way it is implemented today does not really scale ;). That's probably the reason why you said you see sometimes problems when you are not even close to the limit. So if you look at the 250PB Alpine file system of Summit today, that is what's going to deployed at more than one site world wide in 2-4 years and imho independent filesets are a great way to make this large systems much more handy while still maintaining a unified namespace. So I really think it would be beneficial if the architectural limit that prevents scaling the number of independent filesets could be removed at all. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Olaf Weiser > Sent: Friday, August 10, 2018 2:51 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Hallo Stephan, the limit is not a hard coded limit - technically spoken, you can raise it easily. But as always, it is a question of test 'n support .. I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. it might be not the case with your specific workload, because due to the fact, that you 're running already close to 1000 ... I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a number... ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what uses cases should we test / support I think , there might be a good chance for you , that for your specific workload, one would allow and support more than 1000 do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you please .. then we can decide .. if raising the limit is an option for you .. Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke >, Uwe Tron >, Dorian Krause > Date: 08/10/2018 01:29 PM Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 Sign up for an IBM account www.ibm.com IBM account registration https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From djohnson at osc.edu Fri Aug 10 16:22:23 2018 From: djohnson at osc.edu (Doug Johnson) Date: Fri, 10 Aug 2018 11:22:23 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Message-ID: Hi all, I want to chime in because this is precisely what we have done at OSC due to the same motivations Janell described. Our design was based in part on the guidelines in the "Petascale Data Protection" white paper from IBM. We only have ~200 filesets and 250M inodes today, but expect to grow. We are also very interested in details about performance issues and independent filesets. Can IBM elaborate? Best, Doug Martin Lischewski writes: > Hello Olaf, hello Marc, > > we in J?lich are in the middle of migrating/copying all our old filesystems which were created with filesystem > version: 13.23 (3.5.0.7) to new filesystems created with GPFS 5.0.1. > > We move to new filesystems mainly for two reasons: 1. We want to use the new increased number of subblocks. > 2. We have to change our quota from normal "group-quota per filesystem" to "fileset-quota". > > The idea is to create a separate fileset for each group/project. For the users the quota-computation should be > much more transparent. From now on all data which is stored inside of their directory (fileset) counts for their > quota independent of the ownership. > > Right now we have round about 900 groups which means we will create round about 900 filesets per filesystem. > In one filesystem we will have about 400million inodes (with rising tendency). > > This filesystem we will back up with "mmbackup" so we talked with Dominic Mueller-Wicke and he recommended > us to use independent filesets. Because then the policy-runs can be parallelized and we can increase the backup > performance. We belive that we require these parallelized policies run to meet our backup performance targets. > > But there are even more features we enable by using independet filesets. E.g. "Fileset level snapshots" and "user > and group quotas inside of a fileset". > > I did not know about performance issues regarding independent filesets... Can you give us some more > information about this? > > All in all we are strongly supporting the idea of increasing this limit. > > Do I understand correctly that by opening a PMR IBM allows to increase this limit on special sides? I would rather > like to increase the limit and make it official public available and supported. > > Regards, > > Martin > > Am 10.08.2018 um 14:51 schrieb Olaf Weiser: > > Hallo Stephan, > the limit is not a hard coded limit - technically spoken, you can raise it easily. > But as always, it is a question of test 'n support .. > > I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot > performance issues, hangs ... at least noise and partial trouble .. > it might be not the case with your specific workload, because due to the fact, that you 're running already > close to 1000 ... > > I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a > number... > > ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what > uses cases should we test / support > I think , there might be a good chance for you , that for your specific workload, one would allow and support > more than 1000 > > do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you > please .. > then we can decide .. if raising the limit is an option for you .. > > Mit freundlichen Gr??en / Kind regards > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 > E-Mail: olaf.weiser at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo > Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE > 99369940 > > From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke , Uwe Tron , Dorian Krause > > Date: 08/10/2018 01:29 PM > Subject: [gpfsug-discuss] GPFS Independent Fileset Limit > Sent by: gpfsug-discuss-bounces at spectrumscale.org > --------------------------------------------------------------------------------------------------- > > Dear IBM and GPFS List, > > we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will > soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. > > There are also a number of RFEs from other users open, that target this limitation: > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 > > I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again > some time to improve something else. > > If there are any other users on the list that are approaching the current limitation in independent filesets, > please take some time and vote for the RFEs above. > > Many thanks in advance and have a nice weekend. > Best Regards, > Stephan Peinkofer > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Fri Aug 10 17:01:17 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Aug 2018 16:01:17 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: <25008ae9da1649bb969592fdc0a5d6b5@jumptrading.com> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, <25008ae9da1649bb969592fdc0a5d6b5@jumptrading.com> Message-ID: <01780289b9e14e599f848f78b33998d8@jumptrading.com> Just as a follow up to my own note, Stephan, already provided a list of existing RFEs from which to vote through the IBM RFE site, cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Bryan Banister Sent: Friday, August 10, 2018 10:51 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ This is definitely a great candidate for a RFE, if one does not already exist. Not to try and contradict by friend Olaf here, but I have been talking a lot with those internal to IBM, and the PMR process is for finding and correcting operational problems with the code level you are running, and closing out the PMR as quickly as possible. PMRs are not the vehicle for getting substantive changes and enhancements made to the product in general, which the RFE process is really the main way to do this. I just got off a call with Kristie and Carl about the RFE process and those on the list may know that we are working to improve this overall process. More will be sent out about this in the near future!! So I thought I would chime in on this discussion here to hopefully help us understand how important the RFE (admittedly currently got great) process really is and will be a great way to work together on these common goals and needs for the product we rely so heavily upon! Cheers!! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Peinkofer, Stephan Sent: Friday, August 10, 2018 10:40 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ Dear Olaf, I know that this is "just" a "support" limit. However Sven some day on a UG meeting in Ehningen told me that there is more to this than just adjusting your QA qualification tests since the way it is implemented today does not really scale ;). That's probably the reason why you said you see sometimes problems when you are not even close to the limit. So if you look at the 250PB Alpine file system of Summit today, that is what's going to deployed at more than one site world wide in 2-4 years and imho independent filesets are a great way to make this large systems much more handy while still maintaining a unified namespace. So I really think it would be beneficial if the architectural limit that prevents scaling the number of independent filesets could be removed at all. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Olaf Weiser > Sent: Friday, August 10, 2018 2:51 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Hallo Stephan, the limit is not a hard coded limit - technically spoken, you can raise it easily. But as always, it is a question of test 'n support .. I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. it might be not the case with your specific workload, because due to the fact, that you 're running already close to 1000 ... I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a number... ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what uses cases should we test / support I think , there might be a good chance for you , that for your specific workload, one would allow and support more than 1000 do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you please .. then we can decide .. if raising the limit is an option for you .. Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke >, Uwe Tron >, Dorian Krause > Date: 08/10/2018 01:29 PM Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 Sign up for an IBM account www.ibm.com IBM account registration https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Aug 10 18:15:34 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 10 Aug 2018 13:15:34 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, Message-ID: I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): -------------- next part -------------- An HTML attachment was scrubbed... URL: From anobre at br.ibm.com Fri Aug 10 19:10:35 2018 From: anobre at br.ibm.com (Anderson Ferreira Nobre) Date: Fri, 10 Aug 2018 18:10:35 +0000 Subject: [gpfsug-discuss] Top files on GPFS filesystem Message-ID: An HTML attachment was scrubbed... URL: From jake.carroll at uq.edu.au Sat Aug 11 03:18:28 2018 From: jake.carroll at uq.edu.au (Jake Carroll) Date: Sat, 11 Aug 2018 02:18:28 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Message-ID: Just to chime in on this... We have experienced a lot of problems as a result of the independent fileset limitation @ 1000. We have a very large campus wide deployment that relies upon filesets for collection management of large (and small) scientific data outputs. Every human who uses our GPFS AFM fabric gets a "collection", which is an independent fileset. Some may say this was an unwise design choice - but it was deliberate and related to security, namespace and inode isolation. It is a considered decision. Just not considered _enough_ given the 1000 fileset limit ;). We've even had to go as far as re-organising entire filesystems (splitting things apart) to sacrifice performance (less spindles for the filesets on top of a filesystem) to work around it - and sometimes spill into entirely new arrays. I've had it explained to me by internal IBM staff *why* it is hard to fix the fileset limits - and it isn't as straightforward as people think - especially in our case where each fileset is an AFM cache/home relationship - but we desperately need a solution. We logged an RFE. Hopefully others do, also. The complexity has been explained to me by a very good colleague who has helped us a great deal inside IBM (name withheld to protect the innocent) as a knock on effect of the computational overhead and expense of things _associated_ with independent filesets, like recursing a snapshot tree. So - it really isn't as simple as things appear on the surface - but it doesn't mean we shouldn't try to fix it, I suppose! We'd love to see this improved, too - as it's currently making things difficult. Happy to collaborate and work together on this, as always. -jc ---------------------------------------------------------------------- Message: 1 Date: Fri, 10 Aug 2018 11:22:23 -0400 From: Doug Johnson Hi all, I want to chime in because this is precisely what we have done at OSC due to the same motivations Janell described. Our design was based in part on the guidelines in the "Petascale Data Protection" white paper from IBM. We only have ~200 filesets and 250M inodes today, but expect to grow. We are also very interested in details about performance issues and independent filesets. Can IBM elaborate? Best, Doug Martin Lischewski writes: > Hello Olaf, hello Marc, > > we in J?lich are in the middle of migrating/copying all our old > filesystems which were created with filesystem > version: 13.23 (3.5.0.7) to new filesystems created with GPFS 5.0.1. > > We move to new filesystems mainly for two reasons: 1. We want to use the new increased number of subblocks. > 2. We have to change our quota from normal "group-quota per filesystem" to "fileset-quota". > > The idea is to create a separate fileset for each group/project. For > the users the quota-computation should be much more transparent. From > now on all data which is stored inside of their directory (fileset) counts for their quota independent of the ownership. > > Right now we have round about 900 groups which means we will create round about 900 filesets per filesystem. > In one filesystem we will have about 400million inodes (with rising tendency). > > This filesystem we will back up with "mmbackup" so we talked with > Dominic Mueller-Wicke and he recommended us to use independent > filesets. Because then the policy-runs can be parallelized and we can increase the backup performance. We belive that we require these parallelized policies run to meet our backup performance targets. > > But there are even more features we enable by using independet > filesets. E.g. "Fileset level snapshots" and "user and group quotas inside of a fileset". > > I did not know about performance issues regarding independent > filesets... Can you give us some more information about this? > > All in all we are strongly supporting the idea of increasing this limit. > > Do I understand correctly that by opening a PMR IBM allows to increase > this limit on special sides? I would rather like to increase the limit and make it official public available and supported. > > Regards, > > Martin > > Am 10.08.2018 um 14:51 schrieb Olaf Weiser: > > Hallo Stephan, > the limit is not a hard coded limit - technically spoken, you can raise it easily. > But as always, it is a question of test 'n support .. > > I've seen customer cases, where the use of much smaller amount of > independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. > it might be not the case with your specific workload, because due to > the fact, that you 're running already close to 1000 ... > > I suspect , this number of 1000 file sets - at the time of > introducing it - was as also just that one had to pick a number... > > ... turns out.. that a general commitment to support > 1000 > ind.fileset is more or less hard.. because what uses cases should we > test / support I think , there might be a good chance for you , that > for your specific workload, one would allow and support more than > 1000 > > do you still have a PMR for your side for this ? - if not - I know .. > open PMRs is an additional ...but could you please .. > then we can decide .. if raising the limit is an option for you .. > > Mit freundlichen Gr??en / Kind regards > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ---------------------------------------------------------------------- > --------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 > E-Mail: olaf.weiser at de.ibm.com > > ---------------------------------------------------------------------- > --------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, > Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE > 99369940 > > From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke , Uwe Tron > , Dorian Krause > Date: 08/10/2018 01:29 PM > Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- > ----------------------------- > > Dear IBM and GPFS List, > > we at the Leibniz Supercomputing Centre and our GCS Partners from the > J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. > > There are also a number of RFEs from other users open, that target this limitation: > > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID= > 56780 > > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID= > 120534 > > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID= > 106530 > > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID= > 85282 > > I know GPFS Development was very busy fulfilling the CORAL > requirements but maybe now there is again some time to improve something else. > > If there are any other users on the list that are approaching the > current limitation in independent filesets, please take some time and vote for the RFEs above. > > Many thanks in advance and have a nice weekend. > Best Regards, > Stephan Peinkofer > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ Message: 2 Date: Fri, 10 Aug 2018 16:01:17 +0000 From: Bryan Banister To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Message-ID: <01780289b9e14e599f848f78b33998d8 at jumptrading.com> Content-Type: text/plain; charset="iso-8859-1" Just as a follow up to my own note, Stephan, already provided a list of existing RFEs from which to vote through the IBM RFE site, cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Bryan Banister Sent: Friday, August 10, 2018 10:51 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ This is definitely a great candidate for a RFE, if one does not already exist. Not to try and contradict by friend Olaf here, but I have been talking a lot with those internal to IBM, and the PMR process is for finding and correcting operational problems with the code level you are running, and closing out the PMR as quickly as possible. PMRs are not the vehicle for getting substantive changes and enhancements made to the product in general, which the RFE process is really the main way to do this. I just got off a call with Kristie and Carl about the RFE process and those on the list may know that we are working to improve this overall process. More will be sent out about this in the near future!! So I thought I would chime in on this discussion here to hopefully help us understand how important the RFE (admittedly currently got great) process really is and will be a great way to work together on these common goals and needs for the product we rely so heavily upon! Cheers!! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Peinkofer, Stephan Sent: Friday, August 10, 2018 10:40 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ Dear Olaf, I know that this is "just" a "support" limit. However Sven some day on a UG meeting in Ehningen told me that there is more to this than just adjusting your QA qualification tests since the way it is implemented today does not really scale ;). That's probably the reason why you said you see sometimes problems when you are not even close to the limit. So if you look at the 250PB Alpine file system of Summit today, that is what's going to deployed at more than one site world wide in 2-4 years and imho independent filesets are a great way to make this large systems much more handy while still maintaining a unified namespace. So I really think it would be beneficial if the architectural limit that prevents scaling the number of independent filesets could be removed at all. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Olaf Weiser > Sent: Friday, August 10, 2018 2:51 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Hallo Stephan, the limit is not a hard coded limit - technically spoken, you can raise it easily. But as always, it is a question of test 'n support .. I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. it might be not the case with your specific workload, because due to the fact, that you 're running already close to 1000 ... I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a number... ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what uses cases should we test / support I think , there might be a good chance for you , that for your specific workload, one would allow and support more than 1000 do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you please .. then we can decide .. if raising the limit is an option for you .. Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke >, Uwe Tron >, Dorian Krause > Date: 08/10/2018 01:29 PM Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 Sign up for an IBM account www.ibm.com IBM account registration https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potent ial ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potent ial ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 79, Issue 29 ********************************************** From Stephan.Peinkofer at lrz.de Sat Aug 11 08:03:13 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Sat, 11 Aug 2018 07:03:13 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, , Message-ID: <28219001a90040d489e7269aa20fc4ae@lrz.de> Dear Marc, so at least your documentation says: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1hlp_filesfilesets.htm >>> User group and user quotas can be tracked at the file system level or per independent fileset. But obviously as a customer I don't know if that "Really" depends on independence. Currently about 70% of our filesets in the Data Science Storage systems get backed up to ISP. But that number may change over time as it depends on the requirements of our projects. For them it is just selecting "Protect this DSS Container by ISP" in a Web form an our portal then automatically does all the provisioning of the ISP Node to one of our ISP servers, rolling out the new dsm config files to the backup workers and so on. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Friday, August 10, 2018 7:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Sun Aug 12 14:05:53 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 12 Aug 2018 09:05:53 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <28219001a90040d489e7269aa20fc4ae@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, , <28219001a90040d489e7269aa20fc4ae@lrz.de> Message-ID: That's interesting, I confess I never read that piece of documentation. What's also interesting, is that if you look at this doc for quotas: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_change_quota_anynum_users_onproject_basis_acrs_protocols.htm The word independent appears only once in a "Note": It is recommended to create an independent fileset for the project. AND if you look at the mmchfs or mmchcr command you see: --perfileset-quota Sets the scope of user and group quota limit checks to the individual fileset level, rather than to the entire file system. With no mention of "dependent" nor "independent"... From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/11/2018 03:03 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, so at least your documentation says: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1hlp_filesfilesets.htm >>> User group and user quotas can be tracked at the file system level or per independent fileset. But obviously as a customer I don't know if that "Really" depends on independence. Currently about 70% of our filesets in the Data Science Storage systems get backed up to ISP. But that number may change over time as it depends on the requirements of our projects. For them it is just selecting "Protect this DSS Container by ISP" in a Web form an our portal then automatically does all the provisioning of the ISP Node to one of our ISP servers, rolling out the new dsm config files to the backup workers and so on. Best Regards, Stephan Peinkofer From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Friday, August 10, 2018 7:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Aug 13 07:10:04 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 13 Aug 2018 08:10:04 +0200 Subject: [gpfsug-discuss] Top files on GPFS filesystem In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 5698 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 360 bytes Desc: not available URL: From Stephan.Peinkofer at lrz.de Mon Aug 13 08:26:00 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Mon, 13 Aug 2018 07:26:00 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> <28219001a90040d489e7269aa20fc4ae@lrz.de> Message-ID: <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> Dear Marc, OK, so let?s give it a try: [root at datdsst100 pr74qo]# mmlsfileset dsstestfs01 Filesets in file system 'dsstestfs01': Name Status Path root Linked /dss/dsstestfs01 ... quota_test_independent Linked /dss/dsstestfs01/quota_test_independent quota_test_dependent Linked /dss/dsstestfs01/quota_test_independent/quota_test_dependent [root at datdsst100 pr74qo]# mmsetquota dsstestfs01:quota_test_independent --user a2822bp --block 1G:1G --files 10:10 [root at datdsst100 pr74qo]# mmsetquota dsstestfs01:quota_test_dependent --user a2822bp --block 10G:10G --files 100:100 [root at datdsst100 pr74qo]# mmrepquota -u -v dsstestfs01:quota_test_independent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_independent USR 0 1048576 1048576 0 none | 0 10 10 0 none e root quota_test_independent USR 0 0 0 0 none | 1 0 0 0 none i [root at datdsst100 pr74qo]# mmrepquota -u -v dsstestfs01:quota_test_dependent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_dependent USR 0 10485760 10485760 0 none | 0 100 100 0 none e root quota_test_dependent USR 0 0 0 0 none | 1 0 0 0 none i Looks good ? [root at datdsst100 pr74qo]# cd /dss/dsstestfs01/quota_test_independent/quota_test_dependent/ [root at datdsst100 quota_test_dependent]# for foo in `seq 1 99`; do touch file${foo}; chown a2822bp:pr28fa file${foo}; done [root at datdsst100 quota_test_dependent]# mmrepquota -u -v dsstestfs01:quota_test_dependent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_dependent USR 0 10485760 10485760 0 none | 99 100 100 0 none e root quota_test_dependent USR 0 0 0 0 none | 1 0 0 0 none i [root at datdsst100 quota_test_dependent]# mmrepquota -u -v dsstestfs01:quota_test_independent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_independent USR 0 1048576 1048576 0 none | 0 10 10 0 none e root quota_test_independent USR 0 0 0 0 none | 1 0 0 0 none i So it seems that per fileset per user quota is really not depending on independence. But what is the documentation then meaning with: >>> User group and user quotas can be tracked at the file system level or per independent fileset. ??? However, there still remains the problem with mmbackup and mmapplypolicy ? And if you look at some of the RFEs, like the one from DESY, they want even more than 10k independent filesets ? Best Regards, Stephan Peinkofer -- Stephan Peinkofer Dipl. Inf. (FH), M. Sc. (TUM) Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen Tel: +49(0)89 35831-8715 Fax: +49(0)89 35831-9700 URL: http://www.lrz.de On 12. Aug 2018, at 15:05, Marc A Kaplan > wrote: That's interesting, I confess I never read that piece of documentation. What's also interesting, is that if you look at this doc for quotas: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_change_quota_anynum_users_onproject_basis_acrs_protocols.htm The word independent appears only once in a "Note": It is recommended to create an independent fileset for the project. AND if you look at the mmchfs or mmchcr command you see: --perfileset-quota Sets the scope of user and group quota limit checks to the individual fileset level, rather than to the entire file system. With no mention of "dependent" nor "independent"... From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Date: 08/11/2018 03:03 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear Marc, so at least your documentation says: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1hlp_filesfilesets.htm >>> User group and user quotas can be tracked at the file system level or per independent fileset. But obviously as a customer I don't know if that "Really" depends on independence. Currently about 70% of our filesets in the Data Science Storage systems get backed up to ISP. But that number may change over time as it depends on the requirements of our projects. For them it is just selecting "Protect this DSS Container by ISP" in a Web form an our portal then automatically does all the provisioning of the ISP Node to one of our ISP servers, rolling out the new dsm config files to the backup workers and so on. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Marc A Kaplan > Sent: Friday, August 10, 2018 7:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Aug 13 08:52:55 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 13 Aug 2018 09:52:55 +0200 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de> <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Aug 13 16:12:32 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 13 Aug 2018 11:12:32 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de> <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> Message-ID: If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... Like many things in life, sometimes compromises are necessary! From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/13/2018 03:26 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, OK, so let?s give it a try: [root at datdsst100 pr74qo]# mmlsfileset dsstestfs01 Filesets in file system 'dsstestfs01': Name Status Path root Linked /dss/dsstestfs01 ... quota_test_independent Linked /dss/dsstestfs01/quota_test_independent quota_test_dependent Linked /dss/dsstestfs01/quota_test_independent/quota_test_dependent [root at datdsst100 pr74qo]# mmsetquota dsstestfs01:quota_test_independent --user a2822bp --block 1G:1G --files 10:10 [root at datdsst100 pr74qo]# mmsetquota dsstestfs01:quota_test_dependent --user a2822bp --block 10G:10G --files 100:100 [root at datdsst100 pr74qo]# mmrepquota -u -v dsstestfs01:quota_test_independent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_independent USR 0 1048576 1048576 0 none | 0 10 10 0 none e root quota_test_independent USR 0 0 0 0 none | 1 0 0 0 none i [root at datdsst100 pr74qo]# mmrepquota -u -v dsstestfs01:quota_test_dependent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_dependent USR 0 10485760 10485760 0 none | 0 100 100 0 none e root quota_test_dependent USR 0 0 0 0 none | 1 0 0 0 none i Looks good ? [root at datdsst100 pr74qo]# cd /dss/dsstestfs01/quota_test_independent/quota_test_dependent/ [root at datdsst100 quota_test_dependent]# for foo in `seq 1 99`; do touch file${foo}; chown a2822bp:pr28fa file${foo}; done [root at datdsst100 quota_test_dependent]# mmrepquota -u -v dsstestfs01:quota_test_dependent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_dependent USR 0 10485760 10485760 0 none | 99 100 100 0 none e root quota_test_dependent USR 0 0 0 0 none | 1 0 0 0 none i [root at datdsst100 quota_test_dependent]# mmrepquota -u -v dsstestfs01:quota_test_independent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_independent USR 0 1048576 1048576 0 none | 0 10 10 0 none e root quota_test_independent USR 0 0 0 0 none | 1 0 0 0 none i So it seems that per fileset per user quota is really not depending on independence. But what is the documentation then meaning with: >>> User group and user quotas can be tracked at the file system level or per independent fileset. ??? However, there still remains the problem with mmbackup and mmapplypolicy ? And if you look at some of the RFEs, like the one from DESY, they want even more than 10k independent filesets ? Best Regards, Stephan Peinkofer -- Stephan Peinkofer Dipl. Inf. (FH), M. Sc. (TUM) Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen Tel: +49(0)89 35831-8715 Fax: +49(0)89 35831-9700 URL: http://www.lrz.de On 12. Aug 2018, at 15:05, Marc A Kaplan wrote: That's interesting, I confess I never read that piece of documentation. What's also interesting, is that if you look at this doc for quotas: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_change_quota_anynum_users_onproject_basis_acrs_protocols.htm The word independent appears only once in a "Note": It is recommended to create an independent fileset for the project. AND if you look at the mmchfs or mmchcr command you see: --perfileset-quota Sets the scope of user and group quota limit checks to the individual fileset level, rather than to the entire file system. With no mention of "dependent" nor "independent"... From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/11/2018 03:03 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, so at least your documentation says: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1hlp_filesfilesets.htm >>> User group and user quotas can be tracked at the file system level or per independent fileset. But obviously as a customer I don't know if that "Really" depends on independence. Currently about 70% of our filesets in the Data Science Storage systems get backed up to ISP. But that number may change over time as it depends on the requirements of our projects. For them it is just selecting "Protect this DSS Container by ISP" in a Web form an our portal then automatically does all the provisioning of the ISP Node to one of our ISP servers, rolling out the new dsm config files to the backup workers and so on. Best Regards, Stephan Peinkofer From: gpfsug-discuss-bounces at spectrumscale.org < gpfsug-discuss-bounces at spectrumscale.org> on behalf of Marc A Kaplan < makaplan at us.ibm.com> Sent: Friday, August 10, 2018 7:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Aug 13 19:48:20 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]) Date: Mon, 13 Aug 2018 18:48:20 +0000 Subject: [gpfsug-discuss] TCP_QUICKACK Message-ID: <024BF8AB-B747-4EE3-82C9-A746190F99A5@nasa.gov> This is a question mostly for the devs. but really for anyone who can answer. Does GPFS use the TCP_QUICKACK socket flag on Linux? I?m debugging an IPoIB problem exacerbated by GPFS and based on the packet captures it seems as though the answer might be yes, but I?m curious if GPFS is explicitly doing this or if there?s just a timing window in the RPC behavior that just makes it look that way. -Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Aug 13 20:25:44 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 13 Aug 2018 15:25:44 -0400 Subject: [gpfsug-discuss] TCP_QUICKACK In-Reply-To: <024BF8AB-B747-4EE3-82C9-A746190F99A5@nasa.gov> References: <024BF8AB-B747-4EE3-82C9-A746190F99A5@nasa.gov> Message-ID: Hi Aaron, I just searched the core GPFS source code. I didn't find TCP_QUICKACK being used explicitly. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]" To: gpfsug main discussion list Date: 08/13/2018 02:48 PM Subject: [gpfsug-discuss] TCP_QUICKACK Sent by: gpfsug-discuss-bounces at spectrumscale.org This is a question mostly for the devs. but really for anyone who can answer. Does GPFS use the TCP_QUICKACK socket flag on Linux? I?m debugging an IPoIB problem exacerbated by GPFS and based on the packet captures it seems as though the answer might be yes, but I?m curious if GPFS is explicitly doing this or if there?s just a timing window in the RPC behavior that just makes it look that way. -Aaron _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kkr at lbl.gov Tue Aug 14 01:09:24 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 13 Aug 2018 17:09:24 -0700 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 In-Reply-To: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> References: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> Message-ID: <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov> All, don?t forget registration ends on the early side for this event due to background checks, etc. As noted below: IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Hope you?ll be able to attend! Best, Kristy > On Aug 3, 2018, at 12:37 PM, Kristy Kallback-Rose wrote: > > All, > > Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: > ? the draft agenda (bottom of page), > ? a link to registration, register by September 1 due to ORNL site requirements (see next line) > ? an important note about registration requirements for going to Oak Ridge National Lab > ? a request for your site presentations > ? information about HPCXXL and who to contact for information about joining, and > ? other upcoming events. > > Hope you can attend and see Summit and Alpine first hand. > > Best, > Kristy > > Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 > > IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. > > ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. > > About HPCXXL: > HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. > The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. > To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. > > Other upcoming GPFS/SS events: > Sep 19+20 HPCXXL, Oak Ridge > Aug 10 Meetup along TechU, Sydney > Oct 24 NYC User Meeting, New York > Nov 11 SC, Dallas > Dec 12 CIUK, Manchester > > > Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ > Duration Start End Title > > Wednesday 19th, 2018 > > Speaker > > TBD > Chris Maestas (IBM) TBD (IBM) > TBD (IBM) > John Lewars (IBM) > > *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) > John Lewars (IBM) > > Carl Zetie (IBM) TBD > > TBD (ORNL) > TBD (IBM) > William Godoy (ORNL) Ted Hoover (IBM) > > Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All > > 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 > > 13:15 Welcome > 13:45 What is new in Spectrum Scale? > 14:00 What is new in ESS? > 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === > 15:40 AWE > 16:00 CSCS site report > 16:20 Starfish (Sponsor talk) > 16:50 Network Flow > 17:20 RFEs > 17:30 W rap-up > > Thursday 19th, 2018 > > 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 > > 08:50 Alpine ? the Summit file system > 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library > 10:00 AI Reference Architecture > 10:30 === BREAK === > 11:00 Encryption on the wire and on rest 11:30 Service Update > 12:00 Open Forum > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Peinkofer at lrz.de Tue Aug 14 05:50:43 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Tue, 14 Aug 2018 04:50:43 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> <28219001a90040d489e7269aa20fc4ae@lrz.de> <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> Message-ID: <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Aug 14 07:08:55 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 14 Aug 2018 06:08:55 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> <28219001a90040d489e7269aa20fc4ae@lrz.de> <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> , <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> Message-ID: <4830FF9B-A443-4508-A8ED-B023B6EDD15C@huk-coburg.de> +1 great answer Stephan. We also dont understand why funktions are existend, but every time we want to use it, the first step is make a requirement. Von meinem iPhone gesendet Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Am 14.08.2018 um 06:50 schrieb Peinkofer, Stephan >: Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Aug 14 16:31:15 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 14 Aug 2018 11:31:15 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de><75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> Message-ID: True, mmbackup is designed to work best backing up either a single independent fileset or the entire file system. So if you know some filesets do not need to be backed up, map them to one or more indepedent filesets that will not be backed up. mmapplypolicy is happy to scan a single dependent fileset, use option --scope fileset and make the primary argument the path to the root of the fileset you wish to scan. The overhead is not simply described. The directory scan phase will explore or walk the (sub)tree in parallel with multiple threads on multiple nodes, reading just the directory blocks that need to be read. The inodescan phase will read blocks of inodes from the given inodespace ... since the inodes of dependent filesets may be "mixed" into the same blocks as other dependend filesets that are in the same independent fileset, mmapplypolicy will incur what you might consider "extra" overhead. From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/14/2018 12:50 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Aug 15 12:07:45 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 15 Aug 2018 11:07:45 +0000 Subject: [gpfsug-discuss] 5.0.1-2 release? Message-ID: Does anyone know if 5.0.1-2 is actually going to hit fix central? The release alert went out yesterday but the product is still not available. We've been waiting on it for a couple of weeks to fix an issue (we weren't offered an efix and were originally told it was due last week). Due to ongoing issues hitting some of our clients, we've had to take them out of service.... Related to fix central, is anyone else having issues with entitlement to download? We have DME licenses and can download standard edition 5.0, DME 4.2.3 but not DME 5.0.1... I got this fixed for my account, but others in my organisation/customer number don't seem to have access... Just wondering if this is just us, or others are having similar issues? Thanks Simon From r.sobey at imperial.ac.uk Wed Aug 15 13:56:28 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 15 Aug 2018 12:56:28 +0000 Subject: [gpfsug-discuss] 5.0.1 and HSM Message-ID: Hi all, Is anyone running HSM who has also upgraded to 5.0.1? I'd be interested to know if it work(s) or if you had to downgrade back to 5.0.0.X or even 4.2.3.X. Officially the website says not supported, but we've been told (not verbatim) there's no reason why it wouldn't. We really don't want to have to upgrade to a Scale 5 release that's already not receiving any more PTFs but we may have to. Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Aug 15 14:00:18 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 15 Aug 2018 13:00:18 +0000 Subject: [gpfsug-discuss] 5.0.1-2 release? In-Reply-To: References: Message-ID: Sorry, was able to download 5.0.1.1 DME just now, no issues. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson Sent: 15 August 2018 12:08 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] 5.0.1-2 release? Does anyone know if 5.0.1-2 is actually going to hit fix central? The release alert went out yesterday but the product is still not available. We've been waiting on it for a couple of weeks to fix an issue (we weren't offered an efix and were originally told it was due last week). Due to ongoing issues hitting some of our clients, we've had to take them out of service.... Related to fix central, is anyone else having issues with entitlement to download? We have DME licenses and can download standard edition 5.0, DME 4.2.3 but not DME 5.0.1... I got this fixed for my account, but others in my organisation/customer number don't seem to have access... Just wondering if this is just us, or others are having similar issues? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Robert.Oesterlin at nuance.com Wed Aug 15 19:37:50 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 15 Aug 2018 18:37:50 +0000 Subject: [gpfsug-discuss] 5.0.1-2 release? Message-ID: <65E22DAC-1FCE-424D-BE95-4C0D841194E1@nuance.com> 5.0.1.2 is now on Fix Central. Bob Oesterlin Sr Principal Storage Engineer, Nuance ?On 8/15/18, 6:07 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson" wrote: Does anyone know if 5.0.1-2 is actually going to hit fix central? The release alert went out yesterday but the product is still not available. We've been waiting on it for a couple of weeks to fix an issue (we weren't offered an efix and were originally told it was due last week). Due to ongoing issues hitting some of our clients, we've had to take them out of service.... Related to fix central, is anyone else having issues with entitlement to download? We have DME licenses and can download standard edition 5.0, DME 4.2.3 but not DME 5.0.1... I got this fixed for my account, but others in my organisation/customer number don't seem to have access... Just wondering if this is just us, or others are having similar issues? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=OYGVn5hlqVYT-aqb8EERr85EEm8p19iHHWkSpX7AeKc&s=91moEFA-0zhZicJFFWDd4iO2Wt7GhhuaDi6yvZqigrI&e= From carlz at us.ibm.com Thu Aug 16 13:28:22 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Thu, 16 Aug 2018 12:28:22 +0000 Subject: [gpfsug-discuss] Entitlements issues in Fix Central Message-ID: So... who wants to help us fix Fix Central? Two things: 1. I have seen a handful of issues in the last two weeks similar to what Simon and others have described: some versions of Scale download fine, others not. Some user IDs work, some get denied. And there is no obvious pattern or cause. We are looking at it, and more data points will help us track it down, so it would be a big help if everybody who encounters this reported it to Fix Central support: https://www.ibm.com/support/home/?lnk=fcw 2. An internal project is kicking off to improve Fix Central and Passport Advantage. If anybody would like to be a sponsor user in that project, contact me off-list. I can't guarantee participation, but I would love to get a couple of Scale users into the process. thanks, Carl Zetie Offering Manager for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From Dwayne.Hart at med.mun.ca Thu Aug 16 13:35:54 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Thu, 16 Aug 2018 12:35:54 +0000 Subject: [gpfsug-discuss] Entitlements issues in Fix Central In-Reply-To: References: Message-ID: <81C9FEC6-6BCF-433B-BEDB-B32A9B1A63B0@med.mun.ca> Hi Carl, I have access to both Fix Central and Passport Advantage. I?d like to assist in anyway I can. Best, Dwayne ? Dwayne Hart | Systems Administrator IV CHIA, Faculty of Medicine Memorial University of Newfoundland 300 Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L Dobbin Building | 4M409 T 709 864 6631 > On Aug 16, 2018, at 9:58 AM, Carl Zetie wrote: > > > So... who wants to help us fix Fix Central? > > Two things: > > 1. I have seen a handful of issues in the last two weeks similar to what Simon and others have described: some versions of Scale download fine, others not. Some user IDs work, some get denied. And there is no obvious pattern or cause. We are looking at it, and more data points will help us track it down, so it would be a big help if everybody who encounters this reported it to Fix Central support: > > https://www.ibm.com/support/home/?lnk=fcw > > > 2. An internal project is kicking off to improve Fix Central and Passport Advantage. If anybody would like to be a sponsor user in that project, contact me off-list. I can't guarantee participation, but I would love to get a couple of Scale users into the process. > > thanks, > > > > > > > > > > > > > > Carl Zetie > Offering Manager for Spectrum Scale, IBM > ---- > (540) 882 9353 ][ Research Triangle Park > carlz at us.ibm.com > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Stephan.Peinkofer at lrz.de Fri Aug 17 12:39:54 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 17 Aug 2018 11:39:54 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de><75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de>, Message-ID: Dear Marc, well as I think I cannot simply "move" dependent filesets between independent ones and our customers must have the opportunity to change data protection policy for their Containers at any given time, I cannot map them to a "backed up" or "not backed up" independent fileset. So how much performance impact is lets say 1-10 exclude.dir directives per independent fileset? Many thanks in advance. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Tuesday, August 14, 2018 5:31 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? True, mmbackup is designed to work best backing up either a single independent fileset or the entire file system. So if you know some filesets do not need to be backed up, map them to one or more indepedent filesets that will not be backed up. mmapplypolicy is happy to scan a single dependent fileset, use option --scope fileset and make the primary argument the path to the root of the fileset you wish to scan. The overhead is not simply described. The directory scan phase will explore or walk the (sub)tree in parallel with multiple threads on multiple nodes, reading just the directory blocks that need to be read. The inodescan phase will read blocks of inodes from the given inodespace ... since the inodes of dependent filesets may be "mixed" into the same blocks as other dependend filesets that are in the same independent fileset, mmapplypolicy will incur what you might consider "extra" overhead. From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/14/2018 12:50 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Aug 17 12:59:56 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 17 Aug 2018 07:59:56 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de><75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de><65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de>, Message-ID: My idea, not completely thought out, is that before you hit the 1000 limit, you start putting new customers or projects into dependent filesets and define those new dependent filesets within either a lesser number of independent filesets expressly created to receive the new customers OR perhaps even lump them with already existing independent filesets that have matching backup requirements. I would NOT try to create backups for each dependent fileset. But stick with the supported facilities to manage backup per independent... Having said that, if you'd still like to do backup per dependent fileset -- then have at it -- but test, test and retest.... And measure performance... My GUESS is that IF you can hack mmbackup or similar to use mmapplypolicy /path-to-dependent-fileset --scope fileset .... instead of mmapplypolicy /path-to-independent-fileset --scope inodespace .... You'll be okay because the inodescan where you end up reading some extra inodes is probably a tiny fraction of all the other IO you'll be doing! BUT I don't think IBM is in a position to encourage you to hack mmbackup -- it's already very complicated! From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/17/2018 07:40 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, well as I think I cannot simply "move" dependent filesets between independent ones and our customers must have the opportunity to change data protection policy for their Containers at any given time, I cannot map them to a "backed up" or "not backed up" independent fileset. So how much performance impact is lets say 1-10 exclude.dir directives per independent fileset? Many thanks in advance. Best Regards, Stephan Peinkofer From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Tuesday, August 14, 2018 5:31 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? True, mmbackup is designed to work best backing up either a single independent fileset or the entire file system. So if you know some filesets do not need to be backed up, map them to one or more indepedent filesets that will not be backed up. mmapplypolicy is happy to scan a single dependent fileset, use option --scope fileset and make the primary argument the path to the root of the fileset you wish to scan. The overhead is not simply described. The directory scan phase will explore or walk the (sub)tree in parallel with multiple threads on multiple nodes, reading just the directory blocks that need to be read. The inodescan phase will read blocks of inodes from the given inodespace ... since the inodes of dependent filesets may be "mixed" into the same blocks as other dependend filesets that are in the same independent fileset, mmapplypolicy will incur what you might consider "extra" overhead. From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/14/2018 12:50 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sat Aug 18 03:34:30 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 17 Aug 2018 22:34:30 -0400 Subject: [gpfsug-discuss] TCP_QUICKACK In-Reply-To: References: <024BF8AB-B747-4EE3-82C9-A746190F99A5@nasa.gov> Message-ID: <3de256a6-c8f0-3e44-baf8-3f32fb0c4a06@nasa.gov> Thanks! Appreciate the quick answer. On 8/13/18 3:25 PM, IBM Spectrum Scale wrote: > Hi Aaron, > > I just searched the core GPFS source code. I didn't find TCP_QUICKACKbeing used explicitly. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]" ---08/13/2018 02:48:53 PM---This is a question mostly f"Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]" ---08/13/2018 02:48:53 PM---This is a question mostly for the devs. but really for anyone who can answer. Does GPFS use the TCP_ > > From: "Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]" > To: gpfsug main discussion list > Date: 08/13/2018 02:48 PM > Subject: [gpfsug-discuss] TCP_QUICKACK > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > This is a question mostly for the devs. but really for anyone who can answer. > > Does GPFS use the TCP_QUICKACK socket flag on Linux? > > I?m debugging an IPoIB problem exacerbated by GPFS and based on the packet captures it seems as though the answer might be yes, but I?m curious if GPFS is explicitly doing this or if there?s just a timing window in the RPC behavior that just makes it look that way. > > -Aaron > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From david_johnson at brown.edu Mon Aug 20 17:55:18 2018 From: david_johnson at brown.edu (David Johnson) Date: Mon, 20 Aug 2018 12:55:18 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P Message-ID: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> I have one storage pool that was recently doubled, and another pool migrated there using mmapplypolicy. The new half is only 50% full, and the old half is 94% full. Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) d05_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.93G ( 0%) d04_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.9G ( 0%) d03_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.12G ( 0%) d02_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.03G ( 0%) d01_george_23 50.49T 23 No Yes 25.9T ( 51%) 18.92G ( 0%) d00_george_23 50.49T 23 No Yes 25.91T ( 51%) 19.05G ( 0%) d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.35G ( 0%) d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.2G ( 0%) d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) 69.93G ( 0%) d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.11G ( 0%) d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.08G ( 0%) d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.3G ( 0%) d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.25G ( 0%) d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.28G ( 0%) ------------- -------------------- ------------------- (pool total) 706.9T 180.1T ( 25%) 675.5G ( 0%) Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks from the _cit_ NSDs to the _george_ NSDs, so that they end up all around 75% full? Thanks, ? ddj Dave Johnson Brown University CCV/CIS -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Aug 20 19:02:05 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 20 Aug 2018 14:02:05 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: That should do what you want. Be aware that mmrestripefs generates significant IO load so you should either use the QoS feature to mitigate its impact or run the command when the system is not very busy. Note you have two additional NSDs in the 33 failure group than you do in the 23 failure group. You may want to change one of those NSDs in failure group 33 to be in failure group 23 so you have equal storage space in both failure groups. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: David Johnson To: gpfsug main discussion list Date: 08/20/2018 12:55 PM Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P Sent by: gpfsug-discuss-bounces at spectrumscale.org I have one storage pool that was recently doubled, and another pool migrated there using mmapplypolicy. The new half is only 50% full, and the old half is 94% full. Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) d05_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.93G ( 0%) d04_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.9G ( 0%) d03_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.12G ( 0%) d02_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.03G ( 0%) d01_george_23 50.49T 23 No Yes 25.9T ( 51%) 18.92G ( 0%) d00_george_23 50.49T 23 No Yes 25.91T ( 51%) 19.05G ( 0%) d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.35G ( 0%) d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.2G ( 0%) d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) 69.93G ( 0%) d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.11G ( 0%) d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.08G ( 0%) d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.3G ( 0%) d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.25G ( 0%) d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.28G ( 0%) ------------- -------------------- ------------------- (pool total) 706.9T 180.1T ( 25%) 675.5G ( 0%) Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks from the _cit_ NSDs to the _george_ NSDs, so that they end up all around 75% full? Thanks, ? ddj Dave Johnson Brown University CCV/CIS_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Mon Aug 20 19:06:23 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Mon, 20 Aug 2018 14:06:23 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: Does anyone have a good rule of thumb for iops to allow for background QOS tasks? -- ddj Dave Johnson > On Aug 20, 2018, at 2:02 PM, Frederick Stock wrote: > > That should do what you want. Be aware that mmrestripefs generates significant IO load so you should either use the QoS feature to mitigate its impact or run the command when the system is not very busy. > > Note you have two additional NSDs in the 33 failure group than you do in the 23 failure group. You may want to change one of those NSDs in failure group 33 to be in failure group 23 so you have equal storage space in both failure groups. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > From: David Johnson > To: gpfsug main discussion list > Date: 08/20/2018 12:55 PM > Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I have one storage pool that was recently doubled, and another pool migrated there using mmapplypolicy. > The new half is only 50% full, and the old half is 94% full. > > Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) > d05_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.93G ( 0%) > d04_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.9G ( 0%) > d03_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.12G ( 0%) > d02_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.03G ( 0%) > d01_george_23 50.49T 23 No Yes 25.9T ( 51%) 18.92G ( 0%) > d00_george_23 50.49T 23 No Yes 25.91T ( 51%) 19.05G ( 0%) > d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.35G ( 0%) > d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.2G ( 0%) > d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) 69.93G ( 0%) > d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.11G ( 0%) > d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.08G ( 0%) > d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.3G ( 0%) > d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.25G ( 0%) > d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.28G ( 0%) > ------------- -------------------- ------------------- > (pool total) 706.9T 180.1T ( 25%) 675.5G ( 0%) > > Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks from the _cit_ NSDs to the _george_ NSDs, > so that they end up all around 75% full? > > Thanks, > ? ddj > Dave Johnson > Brown University CCV/CIS_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Mon Aug 20 19:13:51 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Mon, 20 Aug 2018 11:13:51 -0700 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: Hey Dave, Can you say more about what you are trying to accomplish by doing the rebalance? IME, the performance hit from running the rebalance was higher than the performance hit from writes being directed to a subset of the disks. If you have any churn of the data, eventually they will rebalance anyway. Regards, Alex On Mon, Aug 20, 2018 at 11:06 AM wrote: > Does anyone have a good rule of thumb for iops to allow for background QOS > tasks? > > > > -- ddj > Dave Johnson > > On Aug 20, 2018, at 2:02 PM, Frederick Stock wrote: > > That should do what you want. Be aware that mmrestripefs generates > significant IO load so you should either use the QoS feature to mitigate > its impact or run the command when the system is not very busy. > > Note you have two additional NSDs in the 33 failure group than you do in > the 23 failure group. You may want to change one of those NSDs in failure > group 33 to be in failure group 23 so you have equal storage space in both > failure groups. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > From: David Johnson > To: gpfsug main discussion list > Date: 08/20/2018 12:55 PM > Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > I have one storage pool that was recently doubled, and another pool > migrated there using mmapplypolicy. > The new half is only 50% full, and the old half is 94% full. > > Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) > d05_george_23 50.49T 23 No Yes 25.91T ( 51%) > 18.93G ( 0%) > d04_george_23 50.49T 23 No Yes 25.91T ( 51%) > 18.9G ( 0%) > d03_george_23 50.49T 23 No Yes 25.9T ( 51%) > 19.12G ( 0%) > d02_george_23 50.49T 23 No Yes 25.9T ( 51%) > 19.03G ( 0%) > d01_george_23 50.49T 23 No Yes 25.9T ( 51%) > 18.92G ( 0%) > d00_george_23 50.49T 23 No Yes 25.91T ( 51%) > 19.05G ( 0%) > d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) > 70.35G ( 0%) > d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) > 70.2G ( 0%) > d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) > 69.93G ( 0%) > d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) > 70.11G ( 0%) > d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) > 70.08G ( 0%) > d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) > 70.3G ( 0%) > d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) > 70.25G ( 0%) > d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) > 70.28G ( 0%) > ------------- -------------------- > ------------------- > (pool total) 706.9T 180.1T ( 25%) > 675.5G ( 0%) > > Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks > from the _cit_ NSDs to the _george_ NSDs, > so that they end up all around 75% full? > > Thanks, > ? ddj > Dave Johnson > Brown University CCV/CIS_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Mon Aug 20 23:08:28 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 20 Aug 2018 18:08:28 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: <23047.1534802908@turing-police.cc.vt.edu> On Mon, 20 Aug 2018 14:02:05 -0400, "Frederick Stock" said: > Note you have two additional NSDs in the 33 failure group than you do in > the 23 failure group. You may want to change one of those NSDs in failure > group 33 to be in failure group 23 so you have equal storage space in both > failure groups. Keep in mind that the failure groups should be built up based on single points of failure. In other words, a failure group should consist of disks that will all stay up or all go down on the same failure (controller, network, whatever). Looking at the fact that you have 6 disks named 'dNN_george_33' and 8 named 'dNN_cit_33', it sounds very likely that they are in two different storage arrays, and you should make your failure groups so they don't span a storage array. In other words, taking a 'cit' disk and moving it into a 'george' failure group will Do The Wrong Thing, because if you do data replication, one copy can go onto a 'george' disk, and the other onto a 'cit' disk that's in the same array as the 'george' disk. If 'george' fails, you lose access to both replicas. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From david_johnson at brown.edu Mon Aug 20 23:21:08 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Mon, 20 Aug 2018 18:21:08 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: <23047.1534802908@turing-police.cc.vt.edu> References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> <23047.1534802908@turing-police.cc.vt.edu> Message-ID: Yes the arrays are in different buildings. We want to spread the activity over more servers if possible but recognize the extra load that rebalancing would entail. The system is busy all the time. I have considered using QOS when we run policy migrations but haven?t yet because I don?t know what value to allow for throttling IOPS. We need to do weekly migrations off of 15k rpm pool onto 7.2k rpm pool, and previously I?ve just let it run at native speed. I?d like to know what other folks have used for QOS settings. I think we may leave things alone for now regarding the original question, rebalancing this pool. -- ddj Dave Johnson > On Aug 20, 2018, at 6:08 PM, valdis.kletnieks at vt.edu wrote: > > On Mon, 20 Aug 2018 14:02:05 -0400, "Frederick Stock" said: > >> Note you have two additional NSDs in the 33 failure group than you do in >> the 23 failure group. You may want to change one of those NSDs in failure >> group 33 to be in failure group 23 so you have equal storage space in both >> failure groups. > > Keep in mind that the failure groups should be built up based on single points of failure. > In other words, a failure group should consist of disks that will all stay up or all go down on > the same failure (controller, network, whatever). > > Looking at the fact that you have 6 disks named 'dNN_george_33' and 8 named 'dNN_cit_33', > it sounds very likely that they are in two different storage arrays, and you should make your > failure groups so they don't span a storage array. In other words, taking a 'cit' disk > and moving it into a 'george' failure group will Do The Wrong Thing, because if you do > data replication, one copy can go onto a 'george' disk, and the other onto a 'cit' disk > that's in the same array as the 'george' disk. If 'george' fails, you lose access to both > replicas. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From aaron.s.knister at nasa.gov Tue Aug 21 01:05:07 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]) Date: Tue, 21 Aug 2018 00:05:07 +0000 Subject: [gpfsug-discuss] fcntl ENOTTY Message-ID: <2DAB9816-7DEE-4890-9045-489692D2BA6A@nasa.gov> Nothing worse than a vague question with little context, eh? Well... Does anyone know why GPFS might return ENOTTY to an fcntl(fd, F_SETLKW, &lock) where lock.l_type is set to F_RDLCK? The error prompting this question looks almost identical to the one in this (unfortunately unanswered) thread: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2014-June/000412.html -Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Tue Aug 21 04:28:19 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 20 Aug 2018 23:28:19 -0400 Subject: [gpfsug-discuss] fcntl ENOTTY In-Reply-To: <2DAB9816-7DEE-4890-9045-489692D2BA6A@nasa.gov> References: <2DAB9816-7DEE-4890-9045-489692D2BA6A@nasa.gov> Message-ID: <5e34373c-d6ff-fca7-4254-64958f636b69@nasa.gov> Argh... Please disregard (I think). Apparently, mpich uses "%X" to format errno (oh yeah, sure, why not use %p to print strings while we're at it) which means that the errno is *actually* 37 which is ENOLCK. Ok, now there's something I can work with. -Aaron p.s. I'm sure that formatting errno with %X made sense at the time (ok, no I'm not), but it sent me down a hell of a rabbit hole and I'm just bitter. No offense intended. On 8/20/18 8:05 PM, Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC] wrote: > Nothing worse than a vague question with little context, eh? Well... > > Does anyone know why GPFS might return ENOTTY to an fcntl(fd, F_SETLKW, &lock) where lock.l_type is set to F_RDLCK? > > The error prompting this question looks almost identical to the one in this (unfortunately unanswered) thread: > > http://www.spectrumscale.org/pipermail/gpfsug-discuss/2014-June/000412.html > > -Aaron > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From luis.bolinches at fi.ibm.com Tue Aug 21 05:11:24 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Tue, 21 Aug 2018 04:11:24 +0000 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: Message-ID: Hi You can enable QoS first to see the activity while on inf value to see the current values of usage and set the li is later on. Those limits are modificable online so even in case you have (not your case it seems) less activity times those can be increased for replication then and Lowe again on peak times. ? SENT FROM MOBILE DEVICE Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous > On 21 Aug 2018, at 1.21, david_johnson at brown.edu wrote: > > Yes the arrays are in different buildings. We want to spread the activity over more servers if possible but recognize the extra load that rebalancing would entail. The system is busy all the time. > > I have considered using QOS when we run policy migrations but haven?t yet because I don?t know what value to allow for throttling IOPS. We need to do weekly migrations off of 15k rpm pool onto 7.2k rpm pool, and previously I?ve just let it run at native speed. I?d like to know what other folks have used for QOS settings. > > I think we may leave things alone for now regarding the original question, rebalancing this pool. > > -- ddj > Dave Johnson > >> On Aug 20, 2018, at 6:08 PM, valdis.kletnieks at vt.edu wrote: >> >> On Mon, 20 Aug 2018 14:02:05 -0400, "Frederick Stock" said: >> >>> Note you have two additional NSDs in the 33 failure group than you do in >>> the 23 failure group. You may want to change one of those NSDs in failure >>> group 33 to be in failure group 23 so you have equal storage space in both >>> failure groups. >> >> Keep in mind that the failure groups should be built up based on single points of failure. >> In other words, a failure group should consist of disks that will all stay up or all go down on >> the same failure (controller, network, whatever). >> >> Looking at the fact that you have 6 disks named 'dNN_george_33' and 8 named 'dNN_cit_33', >> it sounds very likely that they are in two different storage arrays, and you should make your >> failure groups so they don't span a storage array. In other words, taking a 'cit' disk >> and moving it into a 'george' failure group will Do The Wrong Thing, because if you do >> data replication, one copy can go onto a 'george' disk, and the other onto a 'cit' disk >> that's in the same array as the 'george' disk. If 'george' fails, you lose access to both >> replicas. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Tue Aug 21 15:48:15 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Tue, 21 Aug 2018 14:48:15 +0000 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data In-Reply-To: References: <83A6EEB0EC738F459A39439733AE80452672ADC8@MBX114.d.ethz.ch>, Message-ID: <83A6EEB0EC738F459A39439733AE804526743F1B@MBX114.d.ethz.ch> More precisely the problem is the following: If I set period=1 for a "rate" sensor (network speed, NSD read/write speed, PDisk read/write speed) everything is correct because every second the sensors get the valuess of the cumulative counters (and do not divide it by 1, which is not affecting anything for 1 second). If I set the period=2, the "rate" sensors collect the values from the cumulative counters every two seconds but they do not divide by 2 those values (because pmsensors do not actually divide; they seem to silly report what they read which is understand-able from a performance point of view); then grafana receives as double as the real speed. I've to correct myself: here the point is not how sampling/downsampling is done by grafana/grafana-bridge/whatever as I wrongly wrote in my first email. The point is: if I collect data every N seconds (because I do not want to overloads the pmcollector node), how can I divide (in grafana) the reported collected data by N to get real avg speed in that N-seconds time interval ?? At the moment it seems that the only option is using N=1, which is bad because, as I stated, it overloads the collector when many nodes run many pmsensors... A ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of IBM Spectrum Scale [scale at us.ibm.com] Sent: Friday, July 27, 2018 8:27 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] How Zimon/Grafana-bridge process data Hi, as there are more often similar questions rising, we just put an article about the topic on the Spectrum Scale Wiki https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Downsampling%2C%20Upsampling%20and%20Aggregation%20of%20the%20performance%20data While there will be some minor updates on the article in the next time, it might already explain your questions. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. [Inactive hide details for "Dorigo Alvise (PSI)" ---13.07.2018 12:08:59---Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 s]"Dorigo Alvise (PSI)" ---13.07.2018 12:08:59---Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes. From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 13.07.2018 12:08 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes. I've the following perfmon configuration for the metric-group GPFSNSDDisk: { name = "GPFSNSDDisk" period = 2 restrict = "nsdNodes" }, that, as far as I know sends data to the collector every 2 seconds (correct ?). But how ? does it send what it reads from the counter every two seconds ? or does it aggregated in some way ? or what else ? In the collector node pmcollector, grafana-bridge and grafana-server run. Now I need to understand how to play with the grafana parameters: - Down sample (or Disable downsampling) - Aggregator (following on the same row the metrics). See attached picture 4s.png as reference. In the past I had the period set to 1. And grafana used to display correct data (bytes/s for the metric gpfs_nsdds_bytes_written) with aggregator set to "sum", which AFAIK means "sum all that metrics that match the filter below" (again see the attached picture to see how the filter is set to only collect data from the IO nodes). Today I've changed to "period=2"... and grafana started to display funny data rate (the double, or quad of the real rate). I had to play (almost randomly) with "Aggregator" (from sum to avg, which as fas as I undestand doesn't mean anything in my case... average between the two IO nodes ? or what ?) and "Down sample" (from empty to 2s, and then to 4s) to get back real data rate which is compliant with what I do get with dstat. Can someone kindly explain how to play with these parameters when zimon sensor's period is changed ? Many thanks in advance Regards, Alvise Dorigo[attachment "4s.png" deleted by Manfred Haubrich/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From makaplan at us.ibm.com Tue Aug 21 16:42:37 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 21 Aug 2018 11:42:37 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P; using QOS features In-Reply-To: References: Message-ID: (Aside from QOS, I second the notion to review your "failure groups" if you are using and depending on data replication.) For QOS, some suggestions: You might want to define a set of nodes that will do restripes using `mmcrnodeclass restripers -N ...` You can initially just enable `mmchqos FS --enable` and then monitor performance of your restripefs command `mmrestripefs FS -b -N restripers` that restricts operations to the restripers nodeclass. with `mmlsqos FS --seconds 60 [[see other options]]` Suppose you see an average iops rates of several thousand IOPs and you decide that is interfering with other work... Then, for example, you could "slow down" or "pace" mmrestripefs to use 999 iops within the system pool and 1999 iops within the data pool with: mmchqos FS --enable -N restripers pool=system,maintenance=999iops pool=data,maintenance=1999iops And monitor that with mmlsqos. Tip: For a more graphical view of QOS and disk performance, try samples/charts/qosplotfine.pl. You will need to have gnuplot working... If you are "into" performance tools you might want to look at the --fine-stats options of mmchqos and mmlsqos and plug that into your favorite performance viewer/plotter/analyzer tool(s). (Technical: mmlsqos --fine-stats is written to be used and digested by scripts, no so much for human "eyeballing". The --fine-stats argument of mmchqos is a number of seconds. The --fine-stats argument of mmlsqos is one or two index values. The doc for mmlsqos explains this and the qosplotfine.pl script is an example of how to use it. ) From: "Luis Bolinches" To: "gpfsug main discussion list" Date: 08/21/2018 12:56 AM Subject: Re: [gpfsug-discuss] Rebalancing with mmrestripefs -P Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi You can enable QoS first to see the activity while on inf value to see the current values of usage and set the li is later on. Those limits are modificable online so even in case you have (not your case it seems) less activity times those can be increased for replication then and Lowe again on peak times. ? SENT FROM MOBILE DEVICE Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous > On 21 Aug 2018, at 1.21, david_johnson at brown.edu wrote: > > Yes the arrays are in different buildings. We want to spread the activity over more servers if possible but recognize the extra load that rebalancing would entail. The system is busy all the time. > > I have considered using QOS when we run policy migrations but haven?t yet because I don?t know what value to allow for throttling IOPS. We need to do weekly migrations off of 15k rpm pool onto 7.2k rpm pool, and previously I?ve just let it run at native speed. I?d like to know what other folks have used for QOS settings. > > I think we may leave things alone for now regarding the original question, rebalancing this pool. > > -- ddj > Dave Johnson > >> On Aug 20, 2018, at 6:08 PM, valdis.kletnieks at vt.edu wrote: >> >> On Mon, 20 Aug 2018 14:02:05 -0400, "Frederick Stock" said: >> >>> Note you have two additional NSDs in the 33 failure group than you do in >>> the 23 failure group. You may want to change one of those NSDs in failure >>> group 33 to be in failure group 23 so you have equal storage space in both >>> failure groups. >> >> Keep in mind that the failure groups should be built up based on single points of failure. >> In other words, a failure group should consist of disks that will all stay up or all go down on >> the same failure (controller, network, whatever). >> >> Looking at the fact that you have 6 disks named 'dNN_george_33' and 8 named 'dNN_cit_33', >> it sounds very likely that they are in two different storage arrays, and you should make your >> failure groups so they don't span a storage array. In other words, taking a 'cit' disk >> and moving it into a 'george' failure group will Do The Wrong Thing, because if you do >> data replication, one copy can go onto a 'george' disk, and the other onto a 'cit' disk >> that's in the same array as the 'george' disk. If 'george' fails, you lose access to both >> replicas. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Richard.E.Powell at boeing.com Tue Aug 21 19:23:50 2018 From: Richard.E.Powell at boeing.com (Powell (US), Richard E) Date: Tue, 21 Aug 2018 18:23:50 +0000 Subject: [gpfsug-discuss] Problem using group pool for migration Message-ID: <7a0a914601594ccdb6c96504322de9c8@XCH15-09-11.nw.nos.boeing.com> Hi all, I'm trying to use the "GROUP POOL" feature for file migration with FILE_HEAT, similar to one of the ilm sample scripts. The problem I'm having is that it seems to be identifying the candidates correctly but, anytime I use the "group pool" name for the "to pool", it only selects the first candidate for migration. If I specify a single pool name for the "to pool", it selects multiple files as expected. Here are the policy rules I'm using: RULE 'gp' GROUP POOL 'gpool' is 'ssd' then 'disk1' RULE 'repack' MIGRATE FROM POOL 'gpool' TO POOL 'gpool' WEIGHT(FILE_HEAT) I'm not sure if I'm misunderstanding something or if this is a real bug. I'm just wondering if anyone else has run into this issue? I'm running 4.2.3.8 on RHEL 6. Thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Aug 21 20:45:10 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 21 Aug 2018 15:45:10 -0400 Subject: [gpfsug-discuss] Problem using group pool for migration In-Reply-To: <7a0a914601594ccdb6c96504322de9c8@XCH15-09-11.nw.nos.boeing.com> References: <7a0a914601594ccdb6c96504322de9c8@XCH15-09-11.nw.nos.boeing.com> Message-ID: Migrate to a group pool "repacks" the selected files over the pools that comprise the group IN THE ORDER SPECIFIED UP TO THE SPECIFIED LIMIT for each pool. To see this work, in your case, set a limit that is near the current occupancy of pool 'ssd'. For example: RULE ?gp? GROUP POOL ?gpool? is ?ssd? LIMIT(50) then ?disk1? Notice the documentation says the LIMIT defaults to 99. Also, if you've run the same policy before and nothings changed much, then of course, there's not going to be much "repacking" to be done, maybe not any. If the behaviour still doesn't make sense to you, try testing on a tiny file system with just a few small pools, sizing pools and files so that only a few files will fit in a pool... If you build such a test scenario and that still doesn't make sense, show us the example... ----------------------------------- From: "Powell (US), Richard E" Hi all, I?m trying to use the ?GROUP POOL? feature for file migration with FILE_HEAT, similar to one of the ilm sample scripts. The problem I?m having is that it seems to be identifying the candidates correctly but, anytime I use the ?group pool? name for the ?to pool?, it only selects the first candidate for migration. If I specify a single pool name for the ?to pool?, it selects multiple files as expected. Here are the policy rules I?m using: RULE ?gp? GROUP POOL ?gpool? is ?ssd? then ?disk1? RULE ?repack? MIGRATE FROM POOL ?gpool? TO POOL ?gpool? WEIGHT(FILE_HEAT) I?m not sure if I?m misunderstanding something or if this is a real bug. I?m just wondering if anyone else has run into this issue? I?m running 4.2.3.8 on RHEL 6. Thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Aug 21 21:11:10 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 21 Aug 2018 16:11:10 -0400 Subject: [gpfsug-discuss] Problem using group pool for migration In-Reply-To: References: <7a0a914601594ccdb6c96504322de9c8@XCH15-09-11.nw.nos.boeing.com> Message-ID: To repack in random order, which might be an interesting and easy way to test and demonstrate... Use the RAND() function: RULE ... MIGRATE ... WEIGHT(RAND()) ... -L 3 on the mmapplypolicy command will make the random weights evident in the output. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Aug 22 18:12:24 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 22 Aug 2018 17:12:24 +0000 Subject: [gpfsug-discuss] Those users.... Message-ID: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Sometimes, I look at the data that's being stored in my file systems and just shake my head: /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Wed Aug 22 19:17:01 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 22 Aug 2018 14:17:01 -0400 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: <422107E9-0AD1-49F8-99FD-D6713F90A844@ulmer.org> Clearly, those are the ones they?re working on. You?re lucky they?re de-duped. -- Stephen > On Aug 22, 2018, at 1:12 PM, Oesterlin, Robert wrote: > > Sometimes, I look at the data that's being stored in my file systems and just shake my head: > > /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From linesr at janelia.hhmi.org Wed Aug 22 19:54:22 2018 From: linesr at janelia.hhmi.org (Lines, Robert) Date: Wed, 22 Aug 2018 18:54:22 +0000 Subject: [gpfsug-discuss] Those users.... Message-ID: Make a better storage system and they will find a better way to abuse it. A PI during an annual talk to the facility: Because databases are hard and file systems have done a far better job of scaling we have implemented our datastore using files, file name and directory names. It handles the high concurrency far better than any database server we could have built for the amount we are charged for that same very tiny amount of data. Ignoring that the internal pricing for storage is based on sane usage and not packing your entire data set into small enough files that it all lives in the SSD tier. So I feel for you. Rob From: on behalf of "Oesterlin, Robert" Reply-To: gpfsug main discussion list Date: Wednesday, August 22, 2018 at 1:12 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] Those users.... Sometimes, I look at the data that's being stored in my file systems and just shake my head: /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bipcuds at gmail.com Wed Aug 22 20:32:56 2018 From: bipcuds at gmail.com (Keith Ball) Date: Wed, 22 Aug 2018 15:32:56 -0400 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Message-ID: Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. Many Thanks, Keith -------------- next part -------------- An HTML attachment was scrubbed... URL: From Richard.E.Powell at boeing.com Wed Aug 22 21:17:44 2018 From: Richard.E.Powell at boeing.com (Powell (US), Richard E) Date: Wed, 22 Aug 2018 20:17:44 +0000 Subject: [gpfsug-discuss] Problem using group pool for migration Message-ID: Allow me to elaborate on my question. The example I gave was trimmed-down to the minimum. I've been trying various combinations with different LIMIT values and different weight and where clauses, using '-I test' and '-I prepare' to see what it would do, but not actually doing the migration. The 'ssd' pool is about 36% utilized and I've been starting the mmapplypolicy scan at a sub-directory level where nearly all the files were in the disk pool. (You'll just have to trust me that the ssd pool can hold all of them :-)) If I specify 'ssd' as the "to pool", the output from the test or prepare options indicates that it would be able to migrate all of the candidate files to the ssd pool. But, if I specify the group pool as the "to pool", it is only willing to migrate the first candidate. That is with the ssd pool listed first in the group and with any limit as long as it's big enough to hold the current data plus the files I expected it to select, even the default of 99. I'm sure I'm either doing something wrong, or I *really* misunderstand the concept. It seems straight forward enough.... Thanks to everyone for your time! Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: Wednesday, August 22, 2018 4:00 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 79, Issue 47 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Problem using group pool for migration (Powell (US), Richard E) 2. Re: Problem using group pool for migration (Marc A Kaplan) 3. Re: Problem using group pool for migration (Marc A Kaplan) ---------------------------------------------------------------------- Message: 1 Date: Tue, 21 Aug 2018 18:23:50 +0000 From: "Powell (US), Richard E" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Problem using group pool for migration Message-ID: <7a0a914601594ccdb6c96504322de9c8 at XCH15-09-11.nw.nos.boeing.com> Content-Type: text/plain; charset="us-ascii" Hi all, I'm trying to use the "GROUP POOL" feature for file migration with FILE_HEAT, similar to one of the ilm sample scripts. The problem I'm having is that it seems to be identifying the candidates correctly but, anytime I use the "group pool" name for the "to pool", it only selects the first candidate for migration. If I specify a single pool name for the "to pool", it selects multiple files as expected. Here are the policy rules I'm using: RULE 'gp' GROUP POOL 'gpool' is 'ssd' then 'disk1' RULE 'repack' MIGRATE FROM POOL 'gpool' TO POOL 'gpool' WEIGHT(FILE_HEAT) I'm not sure if I'm misunderstanding something or if this is a real bug. I'm just wondering if anyone else has run into this issue? I'm running 4.2.3.8 on RHEL 6. Thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 21 Aug 2018 15:45:10 -0400 From: "Marc A Kaplan" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem using group pool for migration Message-ID: Content-Type: text/plain; charset="utf-8" Migrate to a group pool "repacks" the selected files over the pools that comprise the group IN THE ORDER SPECIFIED UP TO THE SPECIFIED LIMIT for each pool. To see this work, in your case, set a limit that is near the current occupancy of pool 'ssd'. For example: RULE ?gp? GROUP POOL ?gpool? is ?ssd? LIMIT(50) then ?disk1? Notice the documentation says the LIMIT defaults to 99. Also, if you've run the same policy before and nothings changed much, then of course, there's not going to be much "repacking" to be done, maybe not any. If the behaviour still doesn't make sense to you, try testing on a tiny file system with just a few small pools, sizing pools and files so that only a few files will fit in a pool... If you build such a test scenario and that still doesn't make sense, show us the example... ----------------------------------- From: "Powell (US), Richard E" Hi all, I?m trying to use the ?GROUP POOL? feature for file migration with FILE_HEAT, similar to one of the ilm sample scripts. The problem I?m having is that it seems to be identifying the candidates correctly but, anytime I use the ?group pool? name for the ?to pool?, it only selects the first candidate for migration. If I specify a single pool name for the ?to pool?, it selects multiple files as expected. Here are the policy rules I?m using: RULE ?gp? GROUP POOL ?gpool? is ?ssd? then ?disk1? RULE ?repack? MIGRATE FROM POOL ?gpool? TO POOL ?gpool? WEIGHT(FILE_HEAT) I?m not sure if I?m misunderstanding something or if this is a real bug. I?m just wondering if anyone else has run into this issue? I?m running 4.2.3.8 on RHEL 6. Thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Tue, 21 Aug 2018 16:11:10 -0400 From: "Marc A Kaplan" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem using group pool for migration Message-ID: Content-Type: text/plain; charset="us-ascii" To repack in random order, which might be an interesting and easy way to test and demonstrate... Use the RAND() function: RULE ... MIGRATE ... WEIGHT(RAND()) ... -L 3 on the mmapplypolicy command will make the random weights evident in the output. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 79, Issue 47 ********************************************** From valdis.kletnieks at vt.edu Wed Aug 22 21:35:57 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 22 Aug 2018 16:35:57 -0400 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: <168045.1534970157@turing-police.cc.vt.edu> On Wed, 22 Aug 2018 17:12:24 -0000, "Oesterlin, Robert" said: > Sometimes, I look at the data that's being stored in my file systems and just shake my head: > > /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) I've got 114,029 files of the form: /gpfs/archive/tenant/this/that/F:\the\other\thing\what\where\they\thinking/apparently/not/much.dat I admit being mystified - how does such a mess happen? (Note that our tenant users are only able to access the GPFS filesystem through NFS - which is only exported to other Linux systems....) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From jfosburg at mdanderson.org Wed Aug 22 21:44:29 2018 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Wed, 22 Aug 2018 20:44:29 +0000 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <168045.1534970157@turing-police.cc.vt.edu> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> <168045.1534970157@turing-police.cc.vt.edu> Message-ID: <42A96B62-CD95-458B-A702-F6ECFAC66AEF@mdanderson.org> A very, very long time ago we had an AIX system (4.3 with jfs1) where the users logged in interactively. We would find files with names like: /C:\some\very \non-posix\path/file There's a reason they're called lusers. ?On 8/22/18, 3:36 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu" wrote: On Wed, 22 Aug 2018 17:12:24 -0000, "Oesterlin, Robert" said: > Sometimes, I look at the data that's being stored in my file systems and just shake my head: > > /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) I've got 114,029 files of the form: /gpfs/archive/tenant/this/that/F:\the\other\thing\what\where\they\thinking/apparently/not/much.dat I admit being mystified - how does such a mess happen? (Note that our tenant users are only able to access the GPFS filesystem through NFS - which is only exported to other Linux systems....) The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. From jonathan.buzzard at strath.ac.uk Wed Aug 22 23:37:55 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 22 Aug 2018 23:37:55 +0100 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: On 22/08/18 18:12, Oesterlin, Robert wrote: > Sometimes, I look at the data that's being stored in my file systems and > just shake my head: > > /gpfs//Restricted/EventChangeLogs/deduped/working contains > 17,967,350 files (in ONE directory) > That's what inode quota's are for. Set it pretty high to begin with, say one million. That way the vast majority of users have no issues ever. Then the troublesome few will have issues at which point you can determine why they are storing so many files, and appropriately educate them on better ways to do it. Finally if they really need that many files just charge them for it :-) Having lots of files has a cost just like having lots of data has a cost, and it's not fair for the reasonable users to subsidize them. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Thu Aug 23 00:02:28 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 22 Aug 2018 23:02:28 +0000 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Thu Aug 23 01:59:11 2018 From: skylar2 at uw.edu (Skylar Thompson) Date: Wed, 22 Aug 2018 17:59:11 -0700 Subject: [gpfsug-discuss] Those users.... In-Reply-To: References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: <20180823005911.GA5982@almaren> On Wed, Aug 22, 2018 at 11:37:55PM +0100, Jonathan Buzzard wrote: > On 22/08/18 18:12, Oesterlin, Robert wrote: > >Sometimes, I look at the data that's being stored in my file systems and > >just shake my head: > > > >/gpfs//Restricted/EventChangeLogs/deduped/working contains > >17,967,350 files (in ONE directory) > > > > That's what inode quota's are for. Set it pretty high to begin with, say one > million. That way the vast majority of users have no issues ever. Then the > troublesome few will have issues at which point you can determine why they > are storing so many files, and appropriately educate them on better ways to > do it. Finally if they really need that many files just charge them for it > :-) Having lots of files has a cost just like having lots of data has a > cost, and it's not fair for the reasonable users to subsidize them. Yep, we set our fileset inode quota to 1 million/TB of allocated space. It seems overly generous to me but it's far better than no limit at all. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From rohwedder at de.ibm.com Thu Aug 23 09:51:39 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Thu, 23 Aug 2018 10:51:39 +0200 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI In-Reply-To: References: Message-ID: Hello Keith, it is not so easy. The GUI receives events from other scale components using the currently defined ports. Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES). Therefore at this point there is no procedure to change this behaviour across all components. Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080. Tweaking the ports in the server.xml file will only change the native ports that the GUI uses. The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. If these ports are already used by another service, the GUI will not start up. Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap. If you want to emphasize your case as future development item, please let me know. I would also be interested in: > Scale version you are running > Do you need port 80 or 443 as well? > Would it work for you if the xCAT service was bound to a single IP address? Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: Keith Ball To: gpfsug-discuss at spectrumscale.org Date: 22.08.2018 21:33 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: ??? ??????? ??? but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. Many Thanks, ? Keith _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 15917110.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Juri.Haberland at rohde-schwarz.com Thu Aug 23 10:24:38 2018 From: Juri.Haberland at rohde-schwarz.com (Juri Haberland) Date: Thu, 23 Aug 2018 09:24:38 +0000 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Message-ID: Hello Markus, I?m not sure how to interpret your answer: Do the internal processes connect to the non-privileged ports (47443 and 47080) or the privileged ports? If they use the privileged ports we would appreciate it if IBM could change that behavior to using the non-privileged ports so one could change the privileged ones or use a httpd server in front of the GUI web service. We are going to need this in the near future as well? Thanks & kind regards. Juri Haberland -- Juri Haberland R&D SW File Based Media Solutions | 7TF1 Rohde & Schwarz GmbH & Co. KG Hanomaghof 1 | 30449 Hannover Phone: +49 511 678 07 246 | Fax: +49 511 678 07 200 Internet: www.rohde-schwarz.com Gesch?ftsf?hrung / Executive Board: Christian Leicher (Vorsitzender / Chairman), Peter Riedel, Sitz der Gesellschaft / Company's Place of Business: M?nchen, Registereintrag / Commercial Register No.: HRA 16 270, Pers?nlich haftender Gesellschafter / Personally Liable Partner: RUSEG Verwaltungs-GmbH, Sitz der Gesellschaft / Company's Place of Business: M?nchen, Registereintrag / Commercial Register No.: HRB 7 534, Umsatzsteuer-Identifikationsnummer (USt-IdNr.) / VAT Identification No.: DE 130 256 683, Elektro-Altger?te Register (EAR) / WEEE Register No.: DE 240 437 86 From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Markus Rohwedder Sent: Thursday, August 23, 2018 10:52 AM To: gpfsug main discussion list Subject: *EXT* [Newsletter] Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Hello Keith, it is not so easy. The GUI receives events from other scale components using the currently defined ports. Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES). Therefore at this point there is no procedure to change this behaviour across all components. Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080. Tweaking the ports in the server.xml file will only change the native ports that the GUI uses. The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. If these ports are already used by another service, the GUI will not start up. Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap. If you want to emphasize your case as future development item, please let me know. I would also be interested in: > Scale version you are running > Do you need port 80 or 443 as well? > Would it work for you if the xCAT service was bound to a single IP address? Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development ________________________________ Phone: +49 7034 6430190 IBM Deutschland Research & Development [cid:image003.png at 01D43AD3.9FE459C0] E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ [Inactive hide details for Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the]Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? From: Keith Ball > To: gpfsug-discuss at spectrumscale.org Date: 22.08.2018 21:33 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. Many Thanks, Keith _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 166 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 4659 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.gif Type: image/gif Size: 105 bytes Desc: image004.gif URL: From daniel.kidger at uk.ibm.com Thu Aug 23 11:13:04 2018 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Thu, 23 Aug 2018 10:13:04 +0000 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=8FBB0861DFBF7B798f9e8a93df938690918c8FB at .gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.2__=8FBB0861DFBF7B798f9e8a93df938690918c8FB at .gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.3__=8FBB0861DFBF7B798f9e8a93df938690918c8FB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From rohwedder at de.ibm.com Thu Aug 23 12:50:32 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Thu, 23 Aug 2018 13:50:32 +0200 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI In-Reply-To: References: , Message-ID: Hello Juri, Keith, thank you for your responses. The internal services communicate on the privileged ports, for backwards compatibility and firewall simplicity reasons. We can not just assume all nodes in the cluster are at the latest level. Running two services at the same port on different IP addresses could be an option to consider for co-existance of the GUI and another service on the same node. However we have not set up, tested nor documented such a configuration as of today. Currently the GUI service manages the iptables redirect bring up and tear down. If this would be managed externally it would be possible to bind services to specific ports based on specific IPs. In order to create custom redirect rules based on IP address it is necessary to instruct the GUI to - not check for already used ports when the GUI service tries to start up - don't create/destroy port forwarding rules during GUI service start and stop. This GUI behavior can be configured using the internal flag UPDATE_IPTABLES in the service configuration with the 5.0.1.2 GUI code level. The service configuration is not stored in the cluster configuration and may be overwritten during code upgrades, so these settings may have to be added again after an upgrade. See this KC link: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_firewallforgui.htm Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: "Daniel Kidger" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 23.08.2018 12:13 Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org Keith, I have another IBM customer who also wished to move Scale GUI's https ports. In their case because they had their own web based management interface on the same https port. Is this the same reason that you have? If so I wonder how many other sites have the same issue? One workaround that was suggested at the time, was to add a second IP address to the node (piggy-backing on 'eth0'). Then run the two different GUIs, one per IP address. Is this an option, albeit a little ugly? Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-(0)7818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Markus Rohwedder" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Date: Thu, Aug 23, 2018 9:51 AM Hello Keith, it is not so easy. The GUI receives events from other scale components using the currently defined ports. Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES). Therefore at this point there is no procedure to change this behaviour across all components. Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080. Tweaking the ports in the server.xml file will only change the native ports that the GUI uses. The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. If these ports are already used by another service, the GUI will not start up. Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap. If you want to emphasize your case as future development item, please let me know. I would also be interested in: > Scale version you are running > Do you need port 80 or 443 as well? > Would it work for you if the xCAT service was bound to a single IP address? Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany Inactive hide details for Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? From: Keith Ball To: gpfsug-discuss at spectrumscale.org Date: 22.08.2018 21:33 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. Many Thanks, Keith _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 17153317.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 17310450.gif Type: image/gif Size: 60281 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Thu Aug 23 14:27:41 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 23 Aug 2018 13:27:41 +0000 Subject: [gpfsug-discuss] Call home Message-ID: <696B8436-17A4-4EEC-933E-7B1B0B13D498@bham.ac.uk> Hi, I?m just having a poke around with the callhome feature. If I use `mmcallhome group auto`, I can see that it creates a group. Now if I add a node to the cluster, how to I add that node to the same call home group that is already present? If I try for example: $ mmcallhome group add autoGroup_1 MYNEWSERVER --node all Failed to add this group: Group name "autoGroup_1" is already used ? Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Thu Aug 23 14:57:43 2018 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Thu, 23 Aug 2018 13:57:43 +0000 Subject: [gpfsug-discuss] Call home In-Reply-To: <696B8436-17A4-4EEC-933E-7B1B0B13D498@bham.ac.uk> Message-ID: Hi Simon, Just recreate the group using mmcallhome group auto command together with ?force option to overwrite the existing group. Sent from my iPhone using IBM Verse On 23. Aug 2018, 15:27:51, S.J.Thompson at bham.ac.uk wrote: From: S.J.Thompson at bham.ac.uk To: gpfsug-discuss at spectrumscale.org Cc: Date: 23. Aug 2018, 15:27:51 Subject: [gpfsug-discuss] Call home Hi, I?m just having a poke around with the callhome feature. If I use `mmcallhome group auto`, I can see that it creates a group. Now if I add a node to the cluster, how to I add that node to the same call home group that is already present? If I try for example: $ mmcallhome group add autoGroup_1 MYNEWSERVER --node all Failed to add this group: Group name "autoGroup_1" is already used ? Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Aug 23 15:25:00 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 23 Aug 2018 10:25:00 -0400 Subject: [gpfsug-discuss] Problem using group pool for migration In-Reply-To: References: Message-ID: Richard Powell, Good that you have it down to a smallish test case. Let's see it! Here's my test case. Notice I use -L 2 and -I test to see what's what: [root@/main/gpfs-git]$mmapplypolicy c41 -P /gh/c41gp.policy -L 2 -I test [I] GPFS Current Data Pool Utilization in KB and % Pool_Name KB_Occupied KB_Total Percent_Occupied cool 66048 9436160 0.699945741% system 1190656 8388608 14.193725586% xtra 66048 8388608 0.787353516% [I] 4045 of 65792 inodes used: 6.148164%. [I] Loaded policy rules from /gh/c41gp.policy. Evaluating policy rules with CURRENT_TIMESTAMP = 2018-08-23 at 14:11:26 UTC Parsed 2 policy rules. rule 'gp' group pool 'gp' is 'system' limit(3) then 'cool' limit(4) then 'xtra' rule 'mig' migrate from pool 'gp' to pool 'gp' weight(rand()) [I] 2018-08-23 at 14:11:26.367 Directory entries scanned: 8. [I] Directories scan: 7 files, 1 directories, 0 other objects, 0 'skipped' files and/or errors. [I] 2018-08-23 at 14:11:26.371 Sorting 8 file list records. [I] 2018-08-23 at 14:11:26.416 Policy evaluation. 8 files scanned. [I] 2018-08-23 at 14:11:26.421 Sorting 7 candidate file list records. WEIGHT(0.911647) MIGRATE /c41/100e TO POOL gp/cool SHOW() WEIGHT(0.840188) MIGRATE /c41/100a TO POOL gp/cool SHOW() WEIGHT(0.798440) MIGRATE /c41/100d TO POOL gp/cool SHOW() WEIGHT(0.783099) MIGRATE /c41/100c TO POOL gp/xtra SHOW() WEIGHT(0.394383) MIGRATE /c41/100b TO POOL gp/xtra SHOW() WEIGHT(0.335223) MIGRATE /c41/100g TO POOL gp/xtra SHOW() WEIGHT(0.197551) MIGRATE /c41/100f TO POOL gp/xtra SHOW() [I] 2018-08-23 at 14:11:26.430 Choosing candidate files. 7 records scanned. [I] Summary of Rule Applicability and File Choices: Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule 0 7 716800 7 716800 0 RULE 'mig' MIGRATE FROM POOL 'gp' WEIGHT(.) \ TO POOL 'gp' [I] Filesystem objects with no applicable rules: 1. [I] GPFS Policy Decisions and File Choice Totals: Chose to migrate 716800KB: 7 of 7 candidates; [I] File Migrations within Group Pools Group Pool Files_Out KB_Out Files_In KB_In gp system 7 716800 0 0 gp cool 0 0 3 307200 gp xtra 0 0 4 409600 Predicted Data Pool Utilization in KB and %: Pool_Name KB_Occupied KB_Total Percent_Occupied cool 373248 9436160 3.955507325% system 473856 8388608 5.648803711% xtra 475648 8388608 5.670166016% -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Aug 23 16:23:33 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 23 Aug 2018 11:23:33 -0400 Subject: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake In-Reply-To: References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: Millions of files per directory, may well be a mistake... BUT there are some very smart use cases that might take advantage of GPFS having good performance with large directories -- because GPFS uses extensible hashing -- it is better to store millions of files in a single GPFS directory than artificially scatter them among directories based on the mistaken notion that large directories are bad. (Yeah, they are in most implementations, but not in GPFS.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Thu Aug 23 16:32:24 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Thu, 23 Aug 2018 11:32:24 -0400 Subject: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake In-Reply-To: References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: <950BFBAE-145B-43DB-AE07-B3A17DC1795A@brown.edu> But heaven help you if you export the gpfs on nfs or cifs. -- ddj Dave Johnson > On Aug 23, 2018, at 11:23 AM, Marc A Kaplan wrote: > > Millions of files per directory, may well be a mistake... > > BUT there are some very smart use cases that might take advantage of GPFS having good performance with large directories -- > because GPFS uses extensible hashing -- it is better to store millions of files in a single GPFS directory than artificially scatter them among directories based on the mistaken notion that large directories are bad. (Yeah, they are in most implementations, but not in GPFS.) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Aug 23 18:01:27 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 23 Aug 2018 13:01:27 -0400 Subject: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake In-Reply-To: <950BFBAE-145B-43DB-AE07-B3A17DC1795A@brown.edu> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> <950BFBAE-145B-43DB-AE07-B3A17DC1795A@brown.edu> Message-ID: Even with nfs or samba export you're probably okay as long as the application does not attempt to list the directory. Just probe it with stat/open/create/unlink. From: david_johnson at brown.edu To: gpfsug main discussion list Date: 08/23/2018 11:34 AM Subject: Re: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake Sent by: gpfsug-discuss-bounces at spectrumscale.org But heaven help you if you export the gpfs on nfs or cifs. -- ddj Dave Johnson On Aug 23, 2018, at 11:23 AM, Marc A Kaplan wrote: Millions of files per directory, may well be a mistake... BUT there are some very smart use cases that might take advantage of GPFS having good performance with large directories -- because GPFS uses extensible hashing -- it is better to store millions of files in a single GPFS directory than artificially scatter them among directories based on the mistaken notion that large directories are bad. (Yeah, they are in most implementations, but not in GPFS.) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Aug 23 19:30:30 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Aug 2018 18:30:30 +0000 Subject: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake In-Reply-To: References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> <950BFBAE-145B-43DB-AE07-B3A17DC1795A@brown.edu> Message-ID: Thankfully all application developers completely understand why listing directories are a bad idea... ;o) Or at least they will learn the hard way otherwise, -B From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Marc A Kaplan Sent: Thursday, August 23, 2018 12:01 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake Note: External Email ________________________________ Even with nfs or samba export you're probably okay as long as the application does not attempt to list the directory. Just probe it with stat/open/create/unlink. From: david_johnson at brown.edu To: gpfsug main discussion list > Date: 08/23/2018 11:34 AM Subject: Re: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ But heaven help you if you export the gpfs on nfs or cifs. -- ddj Dave Johnson On Aug 23, 2018, at 11:23 AM, Marc A Kaplan > wrote: Millions of files per directory, may well be a mistake... BUT there are some very smart use cases that might take advantage of GPFS having good performance with large directories -- because GPFS uses extensible hashing -- it is better to store millions of files in a single GPFS directory than artificially scatter them among directories based on the mistaken notion that large directories are bad. (Yeah, they are in most implementations, but not in GPFS.) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Aug 23 19:37:21 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 23 Aug 2018 18:37:21 +0000 Subject: [gpfsug-discuss] Are you attending IBM TechU in Hollywood, FL in October? Message-ID: <754D53F3-70C8-4481-9219-1665214C9302@nuance.com> Hi, if you are attending the IBM TechU in October, and are interested in giving a sort client perspective on Spectrum Scale, I?d like to hear from you. On October 15th, there will be a small ?mini-UG? session at this TechU and we?d like to include a client presentation. The rough outline is below, and as you can see it?s ?short and sweet?. Please drop me a note if you?d like to present. 10 mins ? Welcome & Introductions 45 mins ? Spectrum Scale/ESS Latest Enhancements and IBM Coral Project 30 mins - Spectrum Scale Use Cases 20 mins ? Spectrum Scale Client presentation 20 mins ? Spectrum Scale Roadmap 15 mins ? Questions & Close Close ? Drinks & Networking Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Sat Aug 25 01:12:08 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 24 Aug 2018 17:12:08 -0700 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 In-Reply-To: <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov> References: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov> Message-ID: <1D31EBD3-CCC9-423B-83E9-3919C9A3DA1D@lbl.gov> You may consider this an official nag-o-gram that the registration deadline is approaching. September 1st?don?t forget! > On Aug 13, 2018, at 5:09 PM, Kristy Kallback-Rose wrote: > > All, don?t forget registration ends on the early side for this event due to background checks, etc. > > As noted below: > > IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. > > Hope you?ll be able to attend! > > Best, > Kristy > >> On Aug 3, 2018, at 12:37 PM, Kristy Kallback-Rose > wrote: >> >> All, >> >> Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: >> ? the draft agenda (bottom of page), >> ? a link to registration, register by September 1 due to ORNL site requirements (see next line) >> ? an important note about registration requirements for going to Oak Ridge National Lab >> ? a request for your site presentations >> ? information about HPCXXL and who to contact for information about joining, and >> ? other upcoming events. >> >> Hope you can attend and see Summit and Alpine first hand. >> >> Best, >> Kristy >> >> Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 >> >> IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. >> >> ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. >> >> About HPCXXL: >> HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. >> The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. >> To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. >> >> Other upcoming GPFS/SS events: >> Sep 19+20 HPCXXL, Oak Ridge >> Aug 10 Meetup along TechU, Sydney >> Oct 24 NYC User Meeting, New York >> Nov 11 SC, Dallas >> Dec 12 CIUK, Manchester >> >> >> Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ >> Duration Start End Title >> >> Wednesday 19th, 2018 >> >> Speaker >> >> TBD >> Chris Maestas (IBM) TBD (IBM) >> TBD (IBM) >> John Lewars (IBM) >> >> *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) >> John Lewars (IBM) >> >> Carl Zetie (IBM) TBD >> >> TBD (ORNL) >> TBD (IBM) >> William Godoy (ORNL) Ted Hoover (IBM) >> >> Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All >> >> 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 >> >> 13:15 Welcome >> 13:45 What is new in Spectrum Scale? >> 14:00 What is new in ESS? >> 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === >> 15:40 AWE >> 16:00 CSCS site report >> 16:20 Starfish (Sponsor talk) >> 16:50 Network Flow >> 17:20 RFEs >> 17:30 W rap-up >> >> Thursday 19th, 2018 >> >> 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 >> >> 08:50 Alpine ? the Summit file system >> 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library >> 10:00 AI Reference Architecture >> 10:30 === BREAK === >> 11:00 Encryption on the wire and on rest 11:30 Service Update >> 12:00 Open Forum >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ethan-Hereth at utc.edu Mon Aug 27 16:42:17 2018 From: Ethan-Hereth at utc.edu (Hereth, Ethan) Date: Mon, 27 Aug 2018 15:42:17 +0000 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 In-Reply-To: <1D31EBD3-CCC9-423B-83E9-3919C9A3DA1D@lbl.gov> References: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov>, <1D31EBD3-CCC9-423B-83E9-3919C9A3DA1D@lbl.gov> Message-ID: Good morning gpfsug!! TLDR: What day is included in the free GPFS/SS UGM? Can somebody please confirm for me the date(s) for the free GPFS/SS workshop/UGM? Firstly, it appears as if it's on both the 19th and 20th, secondly, the Eventbrite form says that I need to be very accurate so I want to be sure. I'm just 1.5 hours away, so I'm hoping to drive up for the UGM. Cheers! -- Ethan Alan Hereth, PhD High Performance Computing Specialist SimCenter: National Center for Computational Engineering 701 East M.L. King Boulevard Chattanooga, TN 37403 [work]:423.425.5431 [cell]:423.991.4971 ethan-hereth at utc.edu www.utc.edu/simcenter ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Kristy Kallback-Rose Sent: Friday, August 24, 2018 8:12:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 You may consider this an official nag-o-gram that the registration deadline is approaching. September 1st?don?t forget! On Aug 13, 2018, at 5:09 PM, Kristy Kallback-Rose > wrote: All, don?t forget registration ends on the early side for this event due to background checks, etc. As noted below: IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Hope you?ll be able to attend! Best, Kristy On Aug 3, 2018, at 12:37 PM, Kristy Kallback-Rose > wrote: All, Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: ? the draft agenda (bottom of page), ? a link to registration, register by September 1 due to ORNL site requirements (see next line) ? an important note about registration requirements for going to Oak Ridge National Lab ? a request for your site presentations ? information about HPCXXL and who to contact for information about joining, and ? other upcoming events. Hope you can attend and see Summit and Alpine first hand. Best, Kristy Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. About HPCXXL: HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. Other upcoming GPFS/SS events: Sep 19+20 HPCXXL, Oak Ridge Aug 10 Meetup along TechU, Sydney Oct 24 NYC User Meeting, New York Nov 11 SC, Dallas Dec 12 CIUK, Manchester Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ Duration Start End Title Wednesday 19th, 2018 Speaker TBD Chris Maestas (IBM) TBD (IBM) TBD (IBM) John Lewars (IBM) *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) John Lewars (IBM) Carl Zetie (IBM) TBD TBD (ORNL) TBD (IBM) William Godoy (ORNL) Ted Hoover (IBM) Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 13:15 Welcome 13:45 What is new in Spectrum Scale? 14:00 What is new in ESS? 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === 15:40 AWE 16:00 CSCS site report 16:20 Starfish (Sponsor talk) 16:50 Network Flow 17:20 RFEs 17:30 W rap-up Thursday 19th, 2018 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 08:50 Alpine ? the Summit file system 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library 10:00 AI Reference Architecture 10:30 === BREAK === 11:00 Encryption on the wire and on rest 11:30 Service Update 12:00 Open Forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Tue Aug 28 05:17:49 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 27 Aug 2018 21:17:49 -0700 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 In-Reply-To: References: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov> <1D31EBD3-CCC9-423B-83E9-3919C9A3DA1D@lbl.gov> Message-ID: <1802E998-2152-4FDE-9CE7-974203782317@lbl.gov> Two half-days are included. Wednesday 19th, 2018 starting 1p. Thursday 19th, 2018, starting 830 am. I believe there is a plan for a data center tour at the end of Thursday sessions "Summit Facility Tour? on the HPCXXL agenda. Let me know if there are other questions. -Kristy PS - Latest schedule is (PDF): > On Aug 27, 2018, at 8:42 AM, Hereth, Ethan wrote: > > Good morning gpfsug!! > > TLDR: What day is included in the free GPFS/SS UGM? > > Can somebody please confirm for me the date(s) for the free GPFS/SS workshop/UGM? Firstly, it appears as if it's on both the 19th and 20th, secondly, the Eventbrite form says that I need to be very accurate so I want to be sure. > > I'm just 1.5 hours away, so I'm hoping to drive up for the UGM. > > Cheers! > > -- > Ethan Alan Hereth, PhD > High Performance Computing Specialist > > SimCenter: National Center for Computational Engineering > 701 East M.L. King Boulevard > Chattanooga, TN 37403 > > [work]:423.425.5431 > [cell]:423.991.4971 > ethan-hereth at utc.edu > www.utc.edu/simcenter > From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Kristy Kallback-Rose > Sent: Friday, August 24, 2018 8:12:08 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 > > You may consider this an official nag-o-gram that the registration deadline is approaching. September 1st?don?t forget! > > >> On Aug 13, 2018, at 5:09 PM, Kristy Kallback-Rose > wrote: >> >> All, don?t forget registration ends on the early side for this event due to background checks, etc. >> >> As noted below: >> >> IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. >> >> Hope you?ll be able to attend! >> >> Best, >> Kristy >> >>> On Aug 3, 2018, at 12:37 PM, Kristy Kallback-Rose > wrote: >>> >>> All, >>> >>> Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: >>> ? the draft agenda (bottom of page), >>> ? a link to registration, register by September 1 due to ORNL site requirements (see next line) >>> ? an important note about registration requirements for going to Oak Ridge National Lab >>> ? a request for your site presentations >>> ? information about HPCXXL and who to contact for information about joining, and >>> ? other upcoming events. >>> >>> Hope you can attend and see Summit and Alpine first hand. >>> >>> Best, >>> Kristy >>> >>> Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 >>> >>> IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. >>> >>> ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. >>> >>> About HPCXXL: >>> HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. >>> The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. >>> To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. >>> >>> Other upcoming GPFS/SS events: >>> Sep 19+20 HPCXXL, Oak Ridge >>> Aug 10 Meetup along TechU, Sydney >>> Oct 24 NYC User Meeting, New York >>> Nov 11 SC, Dallas >>> Dec 12 CIUK, Manchester >>> >>> >>> Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ >>> Duration Start End Title >>> Wednesday 19th, 2018 >>> Speaker >>> TBD >>> Chris Maestas (IBM) TBD (IBM) >>> TBD (IBM) >>> John Lewars (IBM) >>> *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) >>> John Lewars (IBM) >>> Carl Zetie (IBM) TBD >>> TBD (ORNL) >>> TBD (IBM) >>> William Godoy (ORNL) Ted Hoover (IBM) >>> Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All >>> 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 >>> 13:15 Welcome >>> 13:45 What is new in Spectrum Scale? >>> 14:00 What is new in ESS? >>> 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === >>> 15:40 AWE >>> 16:00 CSCS site report >>> 16:20 Starfish (Sponsor talk) >>> 16:50 Network Flow >>> 17:20 RFEs >>> 17:30 W rap-up >>> Thursday 19th, 2018 >>> 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 >>> 08:50 Alpine ? the Summit file system >>> 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library >>> 10:00 AI Reference Architecture >>> 10:30 === BREAK === >>> 11:00 Encryption on the wire and on rest 11:30 Service Update >>> 12:00 Open Forum >>> >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SSUG18HPCXXL - Agenda - 2018-08-20.pdf Type: application/pdf Size: 109797 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Tue Aug 28 05:51:33 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 27 Aug 2018 21:51:33 -0700 Subject: [gpfsug-discuss] Hiring at NERSC Message-ID: <3721D290-56CB-4D82-9C70-1AF4E2D82CB9@lbl.gov> Hi storage folks, We?re hiring here at NERSC. There are two openings on the storage team at the National Energy Research Scientific Computing Center (NERSC, Berkeley, CA). One for a storage systems administrator and the other for a storage systems developer. If you have questions about the job or the area, let me know. Check the job posting out here: http://m.rfer.us/LBLlpzxG http://m.rfer.us/LBLmOKxH Cheers, Kristy From r.sobey at imperial.ac.uk Tue Aug 28 11:09:23 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 28 Aug 2018 10:09:23 +0000 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: I?m coming late to the party on this so forgive me, but I found that even using QoS I could not even snapshot my filesets in a timely fashion, so my rebalancing could only run at weekends with snapshotting disabled. Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of David Johnson Sent: 20 August 2018 17:55 To: gpfsug main discussion list Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P I have one storage pool that was recently doubled, and another pool migrated there using mmapplypolicy. The new half is only 50% full, and the old half is 94% full. Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) d05_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.93G ( 0%) d04_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.9G ( 0%) d03_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.12G ( 0%) d02_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.03G ( 0%) d01_george_23 50.49T 23 No Yes 25.9T ( 51%) 18.92G ( 0%) d00_george_23 50.49T 23 No Yes 25.91T ( 51%) 19.05G ( 0%) d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.35G ( 0%) d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.2G ( 0%) d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) 69.93G ( 0%) d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.11G ( 0%) d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.08G ( 0%) d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.3G ( 0%) d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.25G ( 0%) d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.28G ( 0%) ------------- -------------------- ------------------- (pool total) 706.9T 180.1T ( 25%) 675.5G ( 0%) Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks from the _cit_ NSDs to the _george_ NSDs, so that they end up all around 75% full? Thanks, ? ddj Dave Johnson Brown University CCV/CIS -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenneth.waegeman at ugent.be Tue Aug 28 13:22:46 2018 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Tue, 28 Aug 2018 14:22:46 +0200 Subject: [gpfsug-discuss] system.log pool on client nodes for HAWC Message-ID: Hi all, I was looking into HAWC , using the 'distributed fast storage in client nodes' method ( https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_hawc_using.htm ) This is achieved by putting? a local device on the clients in the system.log pool. Reading another article (https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_syslogpool.htm ) this would now be used for ALL File system recovery logs. Does this mean that if you have a (small) subset of clients with fast local devices added in the system.log pool, all other clients will use these too instead of the central system pool? Thank you! Kenneth From dod2014 at med.cornell.edu Wed Aug 29 03:51:08 2018 From: dod2014 at med.cornell.edu (Douglas Duckworth) Date: Tue, 28 Aug 2018 22:51:08 -0400 Subject: [gpfsug-discuss] More Drives For DDN 12KX Message-ID: Hi We have a 12KX which will be under support until 2020. Users are currently happy with throughput but we need greater capacity as approaching 80%. The enclosures are only half full. Does DDN require adding disks through them or can we get more 6TB SAS through someone else? We would want support contract for the new disks. If possible I think this would be a good stopgap solution until 2020 when we can buy a new faster cluster. Thank you for your feedback. -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Wed Aug 29 04:55:55 2018 From: skylar2 at uw.edu (Skylar Thompson) Date: Tue, 28 Aug 2018 20:55:55 -0700 Subject: [gpfsug-discuss] More Drives For DDN 12KX In-Reply-To: References: Message-ID: <20180829035555.GA32405@almaren> I would ask DDN this, but my guess is that even if the drives work, you would run into support headaches proving that whatever problem you're running into isn't the result of 3rd-party drives. Even with supported drives, we've run into drive firmware issues with almost all of our storage systems (not just DDN, but Isilon, Hitachi, EMC, etc.); for supported drives, it's a hassle to prove and then get updated, but it would be even worse without support on your side. On Tue, Aug 28, 2018 at 10:51:08PM -0400, Douglas Duckworth wrote: > Hi > > We have a 12KX which will be under support until 2020. Users are currently > happy with throughput but we need greater capacity as approaching 80%. > The enclosures are only half full. > > Does DDN require adding disks through them or can we get more 6TB SAS > through someone else? We would want support contract for the new disks. > If possible I think this would be a good stopgap solution until 2020 when > we can buy a new faster cluster. > > Thank you for your feedback. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From robert at strubi.ox.ac.uk Wed Aug 29 09:31:10 2018 From: robert at strubi.ox.ac.uk (Robert Esnouf) Date: Wed, 29 Aug 2018 09:31:10 +0100 Subject: [gpfsug-discuss] More Drives For DDN 12KX In-Reply-To: References: Message-ID: Realistically I can't see why you'd want to risk invalidating the support contracts that you have in place. You'll also take on worrying about firmware etc etc that is normally taken care of! You will need the caddies as well. We've just done this exercise SFA12KXE and 6TB SAS drives and as well as doubling space we got significantly more performance (after mmrestripe, unless your network is the bottleneck). We left 10 free slots for a potential SSD upgrade (in case of a large increase in inodes or small files). Regards, Robert -- Dr Robert Esnouf University Research Lecturer, Director of Research Computing BDI, Head of Research Computing Core WHG, NDM Research Computing Strategy Officer Main office: Room 10/028, Wellcome Centre for Human Genetics, Old Road Campus, Roosevelt Drive, Oxford OX3 7BN, UK Emails: robert at strubi.ox.ac.uk / robert at well.ox.ac.uk / robert.esnouf at bdi.ox.ac.uk Tel: (+44)-1865-287783 (WHG); (+44)-1865-743689 (BDI) ? -----Original Message----- From: "Douglas Duckworth" To: gpfsug-discuss at spectrumscale.org Date: 29/08/18 04:49 Subject: [gpfsug-discuss] More Drives For DDN 12KX Hi We have a 12KX which will be under support until 2020. Users are currently happy with throughput but we need greater capacity as approaching 80%. The enclosures are only half full. Does DDN require adding disks through them or can we get more 6TB SAS through someone else? We would want support contract for the new disks. If possible I think this would be a good stopgap solution until 2020 when we can buy a new faster cluster. Thank you for your feedback. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Aug 30 23:34:07 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 30 Aug 2018 23:34:07 +0100 Subject: [gpfsug-discuss] fast ACL alter solution In-Reply-To: <55CA6182.9010507@buzzard.me.uk> References: <201508111811.t7BIBYt0004336@d03av04.boulder.ibm.com> <55CA6182.9010507@buzzard.me.uk> Message-ID: On 11/08/15 21:56, Jonathan Buzzard wrote: [SNIP] > > As I said previously what is needed is an "mm" version of the FreeBSD > setfacl command > > http://www.freebsd.org/cgi/man.cgi?format=html&query=setfacl(1) > > That has the -R/--recursive option of the Linux setfacl command which > uses the fast inode scanning GPFS API. > > You want to be able to type something like > > ?mmsetfacl -mR g:www:rpaRc::allow foo > > What you don't want to be doing is calling the abomination of a command > that is mmputacl. Frankly whoever is responsible for that command needs > taking out the back and given a good kicking. A further three years down the line and setting NFSv4 ACL's on the Linux command line is still as painful as it was back in 2011. So I again have a requirement to set NFSv4 ACL's server side :-( Futher, unfortunately somewhere in the last six years I lost my C code to do this :-( In the process of redoing it I have been looking at the source code for the Linux NFSv4 ACL tools. I think that with minimal modification they can be ported to GPFS. So far I have hacked up nfs4_getfacl to work, and it should not be too much extra effort to hack up nfs_setfacl as well. However I have a some questions. Firstly what's the purpose of a special flag to indicate that it is smbd setting the ACL? Does this tie in with the undocumented "mmchfs -k samba" feature? Second there is a whole bunch of stuff about v4.1 ACL's. How does one trigger that. All I seem to be able to do is get POSIX and v4 ACL's. Do you get v4.1 ACL's if you set the file system to "Samba" ACL's? Note in the longer term it I think it would be better to modify FreeBSD's setfacl/getfacl (say renamed to mmsetfacl and mmgetfacl) to do the job, on the basis that they handle both POSIX and NFSv4 ACL's in a single command. Perhaps a RFE? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From vtarasov at us.ibm.com Fri Aug 31 18:49:01 2018 From: vtarasov at us.ibm.com (Vasily Tarasov) Date: Fri, 31 Aug 2018 17:49:01 +0000 Subject: [gpfsug-discuss] system.log pool on client nodes for HAWC In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Aug 31 19:25:34 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 31 Aug 2018 18:25:34 +0000 Subject: [gpfsug-discuss] system.log pool on client nodes for HAWC In-Reply-To: References: , Message-ID: I'm going to add a note of caution about HAWC as well... Firstly this was based on when it was first released,so things might have changed... HAWC replication uses the same failure group policy for placing replicas, therefore you need to use different failure groups for different client nodes. But do this carefully thinking about your failure domains. For example, we initially set each node in a cluster with its own failure group, might seem like a good idea until you shut the rack down (or even just a few select nodes might do it). You then lose your whole storage cluster by accident. (Or maybe you have hpc nodes and no UPS protection, if they have hawk and there is no protected replica, you lose the fs). Maybe this is obvious to everyone, but it bit us in various ways in our early testing. So if you plan to implement it, do test how your storage reacts when a client node fails. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of vtarasov at us.ibm.com [vtarasov at us.ibm.com] Sent: 31 August 2018 18:49 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] system.log pool on client nodes for HAWC That is correct. The blocks of each recovery log are striped across the devices in the system.log pool (if it is defined). As a result, even when all clients have a local device in the system.log pool, many writes to the recovery log will go to remote devices. For a client that lacks a local device in the system.log pool, log writes will always be remote. Notice, that typically in such a setup you would enable log replication for HA. Otherwise, if a single client fails (and its recover log is lost) the whole cluster fails as there is no log to recover FS to consistent state. Therefore, at least one remote write is essential. HTH, -- Vasily Tarasov, Research Staff Member, Storage Systems Research, IBM Research - Almaden ----- Original message ----- From: Kenneth Waegeman Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] system.log pool on client nodes for HAWC Date: Tue, Aug 28, 2018 5:31 AM Hi all, I was looking into HAWC , using the 'distributed fast storage in client nodes' method ( https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_hawc_using.htm ) This is achieved by putting a local device on the clients in the system.log pool. Reading another article (https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_syslogpool.htm ) this would now be used for ALL File system recovery logs. Does this mean that if you have a (small) subset of clients with fast local devices added in the system.log pool, all other clients will use these too instead of the central system pool? Thank you! Kenneth _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 17:55:04 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 16:55:04 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Message-ID: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Aug 1 18:21:01 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 1 Aug 2018 13:21:01 -0400 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+????? ??????????????????????????+ | Block size | Subblock size | +???????????????????????????????+????? ??????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+????? ??????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+????? ??????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+????? ??????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+????? ??????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Aug 1 19:21:28 2018 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 1 Aug 2018 14:21:28 -0400 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: Marc, Kevin, We'll be looking into this issue, since at least at a first glance, it does look odd. A 4MB block size should have resulted in an 8KB subblock size. I suspect that, somehow, the --metadata-block-size 1M may have resulted in 32768 Minimum fragment (subblock) size in bytes (other pools) but I do not yet understand how. The subblocks-per-full-block parameter is not supported with mmcrfs . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Marc A Kaplan" To: gpfsug main discussion list Date: 08/01/2018 01:21 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 19:08:08 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 18:08:08 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: <2E17AB2D-AC59-4A36-A6D8-235C2C2439C3@vanderbilt.edu> Hi Marc, Thanks for the response ? I understand what you?re saying, but since I?m asking for a 1 MB block size for metadata and a 4 MB block size for data and according to the chart in the mmcrfs man page both result in an 8 KB sub block size I?m still confused as to why I?ve got a 32 KB sub block size for my non-system (i.e. data) pools? Especially when you consider that 32 KB isn?t the default even if I had chosen an 8 or 16 MB block size! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 12:21 PM, Marc A Kaplan > wrote: I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd84fdde05c65406d4d9008d5f7d32f0f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687408760535040&sdata=hqVZVIQLbxakARTspzbSkMZBHi2b6%2BIcrPLU1atNbus%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Aug 1 19:41:05 2018 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 1 Aug 2018 11:41:05 -0700 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: the number of subblocks is derived by the smallest blocksize in any pool of a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in the metadata pool, but 4 x of that in the data pool if your data pool is 4M. sven On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop wrote: > Marc, Kevin, > > We'll be looking into this issue, since at least at a first glance, it > does look odd. A 4MB block size should have resulted in an 8KB subblock > size. I suspect that, somehow, the *--metadata-block-size** 1M* may have > resulted in > > > 32768 Minimum fragment (subblock) size in bytes (other pools) > > but I do not yet understand how. > > The *subblocks-per-full-block* parameter is not supported with *mmcrfs *. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: graycol.gif]"Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't > looked into all the details but here's a clue -- notice there is only one > "subblocks-per- > > From: "Marc A Kaplan" > > > To: gpfsug main discussion list > > Date: 08/01/2018 01:21 PM > Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? > > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > I haven't looked into all the details but here's a clue -- notice there is > only one "subblocks-per-full-block" parameter. > > And it is the same for both metadata blocks and datadata blocks. > > So maybe (MAYBE) that is a constraint somewhere... > > Certainly, in the currently supported code, that's what you get. > > > > > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM > Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi All, > > Our production cluster is still on GPFS 4.2.3.x, but in preparation for > moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS > 5.0.1-1. I am setting up a new filesystem there using hardware that we > recently life-cycled out of our production environment. > > I ?successfully? created a filesystem but I believe the sub-block size is > wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs > man page the sub-block size should be 8K: > > Table 1. Block sizes and subblock sizes > > +???????????????????????????????+???????????????????????????????+ > | Block size | Subblock size | > +???????????????????????????????+???????????????????????????????+ > | 64 KiB | 2 KiB | > +???????????????????????????????+???????????????????????????????+ > | 128 KiB | 4 KiB | > +???????????????????????????????+???????????????????????????????+ > | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | > | MiB, 4 MiB | | > +???????????????????????????????+???????????????????????????????+ > | 8 MiB, 16 MiB | 16 KiB | > +???????????????????????????????+???????????????????????????????+ > > However, it appears that it?s 8K for the system pool but 32K for the other > pools: > > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 8192 Minimum fragment (subblock) size in bytes (system pool) > 32768 Minimum fragment (subblock) size in bytes (other pools) > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 2 Default number of metadata replicas > -M 3 Maximum number of metadata replicas > -r 1 Default number of data replicas > -R 3 Maximum number of data replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in effect > -k all ACL semantics in effect > -n 32 Estimated number of nodes that will mount file system > -B 1048576 Block size (system pool) > 4194304 Block size (other pools) > -Q user;group;fileset Quotas accounting enabled > user;group;fileset Quotas enforced > none Default quotas enabled > --perfileset-quota No Per-fileset quota enforcement > --filesetdf No Fileset df enabled? > -V 19.01 (5.0.1.0) File system version > --create-time Wed Aug 1 11:39:39 2018 File system creation time > -z No Is DMAPI enabled? > -L 33554432 Logfile size > -E Yes Exact mtime mount option > -S relatime Suppress atime mount option > -K whenpossible Strict replica allocation option > --fastea Yes Fast external attributes enabled? > --encryption No Encryption enabled? > --inode-limit 101095424 Maximum number of inodes > --log-replicas 0 Number of log replicas > --is4KAligned Yes is4KAligned? > --rapid-repair Yes rapidRepair enabled? > --write-cache-threshold 0 HAWC Threshold (max 65536) > --subblocks-per-full-block 128 Number of subblocks per full block > -P system;raid1;raid6 Disk storage pools in file system > --file-audit-log No File Audit Logging enabled? > --maintenance-mode No Maintenance Mode enabled? > -d > test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd > Disks in file system > -A yes Automatic mount option > -o none Additional mount options > -T /gpfs5 Default mount point > --mount-priority 0 Mount priority > > Output of mmcrfs: > > mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter > -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes > --nofilesetdf --metadata-block-size 1M > > The following disks of gpfs5 will be formatted on node testnsd3: > test21A3nsd: size 953609 MB > test21A4nsd: size 953609 MB > test21B3nsd: size 953609 MB > test21B4nsd: size 953609 MB > test23Ansd: size 15259744 MB > test23Bnsd: size 15259744 MB > test23Cnsd: size 1907468 MB > test24Ansd: size 15259744 MB > test24Bnsd: size 15259744 MB > test24Cnsd: size 1907468 MB > test25Ansd: size 15259744 MB > test25Bnsd: size 15259744 MB > test25Cnsd: size 1907468 MB > Formatting file system ... > Disks up to size 8.29 TB can be added to storage pool system. > Disks up to size 16.60 TB can be added to storage pool raid1. > Disks up to size 132.62 TB can be added to storage pool raid6. > Creating Inode File > 8 % complete on Wed Aug 1 11:39:19 2018 > 18 % complete on Wed Aug 1 11:39:24 2018 > 27 % complete on Wed Aug 1 11:39:29 2018 > 37 % complete on Wed Aug 1 11:39:34 2018 > 48 % complete on Wed Aug 1 11:39:39 2018 > 60 % complete on Wed Aug 1 11:39:44 2018 > 72 % complete on Wed Aug 1 11:39:49 2018 > 83 % complete on Wed Aug 1 11:39:54 2018 > 95 % complete on Wed Aug 1 11:39:59 2018 > 100 % complete on Wed Aug 1 11:40:01 2018 > Creating Allocation Maps > Creating Log Files > 3 % complete on Wed Aug 1 11:40:07 2018 > 28 % complete on Wed Aug 1 11:40:14 2018 > 53 % complete on Wed Aug 1 11:40:19 2018 > 78 % complete on Wed Aug 1 11:40:24 2018 > 100 % complete on Wed Aug 1 11:40:25 2018 > Clearing Inode Allocation Map > Clearing Block Allocation Map > Formatting Allocation Map for storage pool system > 85 % complete on Wed Aug 1 11:40:32 2018 > 100 % complete on Wed Aug 1 11:40:33 2018 > Formatting Allocation Map for storage pool raid1 > 53 % complete on Wed Aug 1 11:40:38 2018 > 100 % complete on Wed Aug 1 11:40:42 2018 > Formatting Allocation Map for storage pool raid6 > 20 % complete on Wed Aug 1 11:40:47 2018 > 39 % complete on Wed Aug 1 11:40:52 2018 > 60 % complete on Wed Aug 1 11:40:57 2018 > 79 % complete on Wed Aug 1 11:41:02 2018 > 100 % complete on Wed Aug 1 11:41:08 2018 > Completed creation of file system /dev/gpfs5. > mmcrfs: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > > And contents of stanza file: > > %nsd: > nsd=test21A3nsd > usage=metadataOnly > failureGroup=210 > pool=system > servers=testnsd3,testnsd1,testnsd2 > device=dm-15 > > %nsd: > nsd=test21A4nsd > usage=metadataOnly > failureGroup=210 > pool=system > servers=testnsd1,testnsd2,testnsd3 > device=dm-14 > > %nsd: > nsd=test21B3nsd > usage=metadataOnly > failureGroup=211 > pool=system > servers=testnsd1,testnsd2,testnsd3 > device=dm-17 > > %nsd: > nsd=test21B4nsd > usage=metadataOnly > failureGroup=211 > pool=system > servers=testnsd2,testnsd3,testnsd1 > device=dm-16 > > %nsd: > nsd=test23Ansd > usage=dataOnly > failureGroup=23 > pool=raid6 > servers=testnsd2,testnsd3,testnsd1 > device=dm-10 > > %nsd: > nsd=test23Bnsd > usage=dataOnly > failureGroup=23 > pool=raid6 > servers=testnsd3,testnsd1,testnsd2 > device=dm-9 > > %nsd: > nsd=test23Cnsd > usage=dataOnly > failureGroup=23 > pool=raid1 > servers=testnsd1,testnsd2,testnsd3 > device=dm-5 > > %nsd: > nsd=test24Ansd > usage=dataOnly > failureGroup=24 > pool=raid6 > servers=testnsd3,testnsd1,testnsd2 > device=dm-6 > > %nsd: > nsd=test24Bnsd > usage=dataOnly > failureGroup=24 > pool=raid6 > servers=testnsd1,testnsd2,testnsd3 > device=dm-0 > > %nsd: > nsd=test24Cnsd > usage=dataOnly > failureGroup=24 > pool=raid1 > servers=testnsd2,testnsd3,testnsd1 > device=dm-2 > > %nsd: > nsd=test25Ansd > usage=dataOnly > failureGroup=25 > pool=raid6 > servers=testnsd1,testnsd2,testnsd3 > device=dm-6 > > %nsd: > nsd=test25Bnsd > usage=dataOnly > failureGroup=25 > pool=raid6 > servers=testnsd2,testnsd3,testnsd1 > device=dm-6 > > %nsd: > nsd=test25Cnsd > usage=dataOnly > failureGroup=25 > pool=raid1 > servers=testnsd3,testnsd1,testnsd2 > device=dm-3 > > %pool: > pool=system > blockSize=1M > usage=metadataOnly > layoutMap=scatter > allowWriteAffinity=no > > %pool: > pool=raid6 > blockSize=4M > usage=dataOnly > layoutMap=scatter > allowWriteAffinity=no > > %pool: > pool=raid1 > blockSize=4M > usage=dataOnly > layoutMap=scatter > allowWriteAffinity=no > > What am I missing or what have I done wrong? Thanks? > > Kevin > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > *Kevin.Buterbaugh at vanderbilt.edu* - > (615)875-9633 <(615)%20875-9633> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Wed Aug 1 19:47:31 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 1 Aug 2018 14:47:31 -0400 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: <2E17AB2D-AC59-4A36-A6D8-235C2C2439C3@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <2E17AB2D-AC59-4A36-A6D8-235C2C2439C3@vanderbilt.edu> Message-ID: I guess that particular table is not the whole truth, nor a specification, nor a promise, but a simplified summary of what you get when there is just one block size that applies to both meta-data and data-data. You have discovered that it does not apply to systems where metadata has a different blocksize than data-data. My guesstimate (speculation!) is that the deployed code chooses one subblocks-per-full-block parameter and applies that to both. Which would explain the results we're seeing. Further is seems the the mmlsfs command assumes at least in some places that there is only one subblocks-per-block parameter... Looking deeper into code, is another story for another day -- but I'll say that there seems to be sufficient flexibility that if this were deemed a burning issue, there could be futher "enhancements..." ;-) From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/01/2018 02:24 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Marc, Thanks for the response ? I understand what you?re saying, but since I?m asking for a 1 MB block size for metadata and a 4 MB block size for data and according to the chart in the mmcrfs man page both result in an 8 KB sub block size I?m still confused as to why I?ve got a 32 KB sub block size for my non-system (i.e. data) pools? Especially when you consider that 32 KB isn?t the default even if I had chosen an 8 or 16 MB block size! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 12:21 PM, Marc A Kaplan wrote: I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+????? ??????????????????????????+ | Block size | Subblock size | +???????????????????????????????+????? ??????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+????? ??????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+????? ??????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+????? ??????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+????? ??????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd84fdde05c65406d4d9008d5f7d32f0f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687408760535040&sdata=hqVZVIQLbxakARTspzbSkMZBHi2b6%2BIcrPLU1atNbus%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 19:52:37 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 18:52:37 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> All, Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB ? so does this go back to what Marc is saying that there?s really only one sub blocks per block parameter? If so, is there any way to get what I want as described below? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L > wrote: Hi Sven, OK ? but why? I mean, that?s not what the man page says. Where does that ?4 x? come from? And, most importantly ? that?s not what I want. I want a smaller block size for the system pool since it?s metadata only and on RAID 1 mirrors (HD?s on the test cluster but SSD?s on the production cluster). So ? side question ? is 1 MB OK there? But I want a 4 MB block size for data with an 8 KB sub block ? I want good performance for the sane people using our cluster without unduly punishing the ? ahem ? fine folks whose apps want to create a bazillion tiny files! So how do I do that? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:41 PM, Sven Oehme > wrote: the number of subblocks is derived by the smallest blocksize in any pool of a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in the metadata pool, but 4 x of that in the data pool if your data pool is 4M. sven On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop > wrote: Marc, Kevin, We'll be looking into this issue, since at least at a first glance, it does look odd. A 4MB block size should have resulted in an 8KB subblock size. I suspect that, somehow, the --metadata-block-size 1M may have resulted in 32768 Minimum fragment (subblock) size in bytes (other pools) but I do not yet understand how. The subblocks-per-full-block parameter is not supported with mmcrfs . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 "Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per- From: "Marc A Kaplan" > To: gpfsug main discussion list > Date: 08/01/2018 01:21 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Wed Aug 1 20:10:50 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 1 Aug 2018 19:10:50 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Message-ID: Kevin asks: >>>> Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB ? so does this go back to what Marc is saying that there?s really only one sub blocks per block parameter? If so, is there any way to get what I want as described below? <<< Yep. Basically what's happening is: When you ask for a certain block size, Scale infers the subblock size as shown in the table. As Sven said, here you are asking for 1M blocks for metadata, so you get 8KiB subblocks. So far so good. These two numbers together determine the number of subblocks per block parameter, which as Marc said is shared across all the pools. So in order for your 4M data blocks to have the same number of subblocks per block as your 1M metadata blocks, the subblocks have to be 4 times as big. Something similar would happen with *any* choice of data block size above 1M, of course. The smallest size wins, and the 8KiB number is coming from the 1M, not the 4M. (Thanks, Sven). regards, Carl Zetie Offering Manager for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 19:47:47 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 18:47:47 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: Hi Sven, OK ? but why? I mean, that?s not what the man page says. Where does that ?4 x? come from? And, most importantly ? that?s not what I want. I want a smaller block size for the system pool since it?s metadata only and on RAID 1 mirrors (HD?s on the test cluster but SSD?s on the production cluster). So ? side question ? is 1 MB OK there? But I want a 4 MB block size for data with an 8 KB sub block ? I want good performance for the sane people using our cluster without unduly punishing the ? ahem ? fine folks whose apps want to create a bazillion tiny files! So how do I do that? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:41 PM, Sven Oehme > wrote: the number of subblocks is derived by the smallest blocksize in any pool of a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in the metadata pool, but 4 x of that in the data pool if your data pool is 4M. sven On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop > wrote: Marc, Kevin, We'll be looking into this issue, since at least at a first glance, it does look odd. A 4MB block size should have resulted in an 8KB subblock size. I suspect that, somehow, the --metadata-block-size 1M may have resulted in 32768 Minimum fragment (subblock) size in bytes (other pools) but I do not yet understand how. The subblocks-per-full-block parameter is not supported with mmcrfs . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 "Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per- From: "Marc A Kaplan" > To: gpfsug main discussion list > Date: 08/01/2018 01:21 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Aug 1 22:01:28 2018 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 1 Aug 2018 14:01:28 -0700 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> Message-ID: the only way to get max number of subblocks for a 5.0.x filesystem with the released code is to have metadata and data use the same blocksize. sven On Wed, Aug 1, 2018 at 11:52 AM Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu> wrote: > All, > > Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB ? so does > this go back to what Marc is saying that there?s really only one sub blocks > per block parameter? If so, is there any way to get what I want as > described below? > > Thanks? > > Kevin > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> > > > On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L < > Kevin.Buterbaugh at Vanderbilt.Edu> wrote: > > Hi Sven, > > OK ? but why? I mean, that?s not what the man page says. Where does that > ?4 x? come from? > > And, most importantly ? that?s not what I want. I want a smaller block > size for the system pool since it?s metadata only and on RAID 1 mirrors > (HD?s on the test cluster but SSD?s on the production cluster). So ? side > question ? is 1 MB OK there? > > But I want a 4 MB block size for data with an 8 KB sub block ? I want good > performance for the sane people using our cluster without unduly punishing > the ? ahem ? fine folks whose apps want to create a bazillion tiny files! > > So how do I do that? > > Thanks! > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> > > > On Aug 1, 2018, at 1:41 PM, Sven Oehme wrote: > > the number of subblocks is derived by the smallest blocksize in any pool > of a given filesystem. so if you pick a metadata blocksize of 1M it will be > 8k in the metadata pool, but 4 x of that in the data pool if your data pool > is 4M. > > sven > > On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop wrote: > > Marc, Kevin, >> >> We'll be looking into this issue, since at least at a first glance, it >> does look odd. A 4MB block size should have resulted in an 8KB subblock >> size. I suspect that, somehow, the *--metadata-block-size** 1M* may have >> resulted in >> >> >> 32768 Minimum fragment (subblock) size in bytes (other pools) >> >> but I do not yet understand how. >> >> The *subblocks-per-full-block* parameter is not supported with *mmcrfs *. >> >> Felipe >> >> ---- >> Felipe Knop knop at us.ibm.com >> GPFS Development and Security >> IBM Systems >> IBM Building 008 >> 2455 South Rd, Poughkeepsie, NY 12601 >> (845) 433-9314 T/L 293-9314 >> >> >> >> "Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't >> looked into all the details but here's a clue -- notice there is only one >> "subblocks-per- >> >> From: "Marc A Kaplan" >> >> >> To: gpfsug main discussion list >> >> Date: 08/01/2018 01:21 PM >> Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? >> >> >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> ------------------------------ >> >> >> >> I haven't looked into all the details but here's a clue -- notice there >> is only one "subblocks-per-full-block" parameter. >> >> And it is the same for both metadata blocks and datadata blocks. >> >> So maybe (MAYBE) that is a constraint somewhere... >> >> Certainly, in the currently supported code, that's what you get. >> >> >> >> >> From: "Buterbaugh, Kevin L" >> To: gpfsug main discussion list >> Date: 08/01/2018 12:55 PM >> Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> ------------------------------ >> >> >> >> Hi All, >> >> Our production cluster is still on GPFS 4.2.3.x, but in preparation for >> moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS >> 5.0.1-1. I am setting up a new filesystem there using hardware that we >> recently life-cycled out of our production environment. >> >> I ?successfully? created a filesystem but I believe the sub-block size is >> wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs >> man page the sub-block size should be 8K: >> >> Table 1. Block sizes and subblock sizes >> >> +???????????????????????????????+???????????????????????????????+ >> | Block size | Subblock size | >> +???????????????????????????????+???????????????????????????????+ >> | 64 KiB | 2 KiB | >> +???????????????????????????????+???????????????????????????????+ >> | 128 KiB | 4 KiB | >> +???????????????????????????????+???????????????????????????????+ >> | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | >> | MiB, 4 MiB | | >> +???????????????????????????????+???????????????????????????????+ >> | 8 MiB, 16 MiB | 16 KiB | >> +???????????????????????????????+???????????????????????????????+ >> >> However, it appears that it?s 8K for the system pool but 32K for the >> other pools: >> >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 8192 Minimum fragment (subblock) size in bytes (system pool) >> 32768 Minimum fragment (subblock) size in bytes (other pools) >> -i 4096 Inode size in bytes >> -I 32768 Indirect block size in bytes >> -m 2 Default number of metadata replicas >> -M 3 Maximum number of metadata replicas >> -r 1 Default number of data replicas >> -R 3 Maximum number of data replicas >> -j scatter Block allocation type >> -D nfs4 File locking semantics in effect >> -k all ACL semantics in effect >> -n 32 Estimated number of nodes that will mount file system >> -B 1048576 Block size (system pool) >> 4194304 Block size (other pools) >> -Q user;group;fileset Quotas accounting enabled >> user;group;fileset Quotas enforced >> none Default quotas enabled >> --perfileset-quota No Per-fileset quota enforcement >> --filesetdf No Fileset df enabled? >> -V 19.01 (5.0.1.0) File system version >> --create-time Wed Aug 1 11:39:39 2018 File system creation time >> -z No Is DMAPI enabled? >> -L 33554432 Logfile size >> -E Yes Exact mtime mount option >> -S relatime Suppress atime mount option >> -K whenpossible Strict replica allocation option >> --fastea Yes Fast external attributes enabled? >> --encryption No Encryption enabled? >> --inode-limit 101095424 Maximum number of inodes >> --log-replicas 0 Number of log replicas >> --is4KAligned Yes is4KAligned? >> --rapid-repair Yes rapidRepair enabled? >> --write-cache-threshold 0 HAWC Threshold (max 65536) >> --subblocks-per-full-block 128 Number of subblocks per full block >> -P system;raid1;raid6 Disk storage pools in file system >> --file-audit-log No File Audit Logging enabled? >> --maintenance-mode No Maintenance Mode enabled? >> -d >> test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd >> Disks in file system >> -A yes Automatic mount option >> -o none Additional mount options >> -T /gpfs5 Default mount point >> --mount-priority 0 Mount priority >> >> Output of mmcrfs: >> >> mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j >> scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 >> -v yes --nofilesetdf --metadata-block-size 1M >> >> The following disks of gpfs5 will be formatted on node testnsd3: >> test21A3nsd: size 953609 MB >> test21A4nsd: size 953609 MB >> test21B3nsd: size 953609 MB >> test21B4nsd: size 953609 MB >> test23Ansd: size 15259744 MB >> test23Bnsd: size 15259744 MB >> test23Cnsd: size 1907468 MB >> test24Ansd: size 15259744 MB >> test24Bnsd: size 15259744 MB >> test24Cnsd: size 1907468 MB >> test25Ansd: size 15259744 MB >> test25Bnsd: size 15259744 MB >> test25Cnsd: size 1907468 MB >> Formatting file system ... >> Disks up to size 8.29 TB can be added to storage pool system. >> Disks up to size 16.60 TB can be added to storage pool raid1. >> Disks up to size 132.62 TB can be added to storage pool raid6. >> Creating Inode File >> 8 % complete on Wed Aug 1 11:39:19 2018 >> 18 % complete on Wed Aug 1 11:39:24 2018 >> 27 % complete on Wed Aug 1 11:39:29 2018 >> 37 % complete on Wed Aug 1 11:39:34 2018 >> 48 % complete on Wed Aug 1 11:39:39 2018 >> 60 % complete on Wed Aug 1 11:39:44 2018 >> 72 % complete on Wed Aug 1 11:39:49 2018 >> 83 % complete on Wed Aug 1 11:39:54 2018 >> 95 % complete on Wed Aug 1 11:39:59 2018 >> 100 % complete on Wed Aug 1 11:40:01 2018 >> Creating Allocation Maps >> Creating Log Files >> 3 % complete on Wed Aug 1 11:40:07 2018 >> 28 % complete on Wed Aug 1 11:40:14 2018 >> 53 % complete on Wed Aug 1 11:40:19 2018 >> 78 % complete on Wed Aug 1 11:40:24 2018 >> 100 % complete on Wed Aug 1 11:40:25 2018 >> Clearing Inode Allocation Map >> Clearing Block Allocation Map >> Formatting Allocation Map for storage pool system >> 85 % complete on Wed Aug 1 11:40:32 2018 >> 100 % complete on Wed Aug 1 11:40:33 2018 >> Formatting Allocation Map for storage pool raid1 >> 53 % complete on Wed Aug 1 11:40:38 2018 >> 100 % complete on Wed Aug 1 11:40:42 2018 >> Formatting Allocation Map for storage pool raid6 >> 20 % complete on Wed Aug 1 11:40:47 2018 >> 39 % complete on Wed Aug 1 11:40:52 2018 >> 60 % complete on Wed Aug 1 11:40:57 2018 >> 79 % complete on Wed Aug 1 11:41:02 2018 >> 100 % complete on Wed Aug 1 11:41:08 2018 >> Completed creation of file system /dev/gpfs5. >> mmcrfs: Propagating the cluster configuration data to all >> affected nodes. This is an asynchronous process. >> >> And contents of stanza file: >> >> %nsd: >> nsd=test21A3nsd >> usage=metadataOnly >> failureGroup=210 >> pool=system >> servers=testnsd3,testnsd1,testnsd2 >> device=dm-15 >> >> %nsd: >> nsd=test21A4nsd >> usage=metadataOnly >> failureGroup=210 >> pool=system >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-14 >> >> %nsd: >> nsd=test21B3nsd >> usage=metadataOnly >> failureGroup=211 >> pool=system >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-17 >> >> %nsd: >> nsd=test21B4nsd >> usage=metadataOnly >> failureGroup=211 >> pool=system >> servers=testnsd2,testnsd3,testnsd1 >> device=dm-16 >> >> %nsd: >> nsd=test23Ansd >> usage=dataOnly >> failureGroup=23 >> pool=raid6 >> servers=testnsd2,testnsd3,testnsd1 >> device=dm-10 >> >> %nsd: >> nsd=test23Bnsd >> usage=dataOnly >> failureGroup=23 >> pool=raid6 >> servers=testnsd3,testnsd1,testnsd2 >> device=dm-9 >> >> %nsd: >> nsd=test23Cnsd >> usage=dataOnly >> failureGroup=23 >> pool=raid1 >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-5 >> >> %nsd: >> nsd=test24Ansd >> usage=dataOnly >> failureGroup=24 >> pool=raid6 >> servers=testnsd3,testnsd1,testnsd2 >> device=dm-6 >> >> %nsd: >> nsd=test24Bnsd >> usage=dataOnly >> failureGroup=24 >> pool=raid6 >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-0 >> >> %nsd: >> nsd=test24Cnsd >> usage=dataOnly >> failureGroup=24 >> pool=raid1 >> servers=testnsd2,testnsd3,testnsd1 >> device=dm-2 >> >> %nsd: >> nsd=test25Ansd >> usage=dataOnly >> failureGroup=25 >> pool=raid6 >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-6 >> >> %nsd: >> nsd=test25Bnsd >> usage=dataOnly >> failureGroup=25 >> pool=raid6 >> servers=testnsd2,testnsd3,testnsd1 >> device=dm-6 >> >> %nsd: >> nsd=test25Cnsd >> usage=dataOnly >> failureGroup=25 >> pool=raid1 >> servers=testnsd3,testnsd1,testnsd2 >> device=dm-3 >> >> %pool: >> pool=system >> blockSize=1M >> usage=metadataOnly >> layoutMap=scatter >> allowWriteAffinity=no >> >> %pool: >> pool=raid6 >> blockSize=4M >> usage=dataOnly >> layoutMap=scatter >> allowWriteAffinity=no >> >> %pool: >> pool=raid1 >> blockSize=4M >> usage=dataOnly >> layoutMap=scatter >> allowWriteAffinity=no >> >> What am I missing or what have I done wrong? Thanks? >> >> Kevin >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and >> Education >> *Kevin.Buterbaugh at vanderbilt.edu* - >> (615)875-9633 <(615)%20875-9633> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&reserved=0 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 22:58:26 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 21:58:26 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> Message-ID: <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Hi Sven (and Stephen and everyone else), I know there are certainly things you know but can?t talk about, but I suspect that I am not the only one to wonder about the possible significance of ?with the released code? in your response below?!? I understand the technical point you?re making and maybe the solution for me is to just use a 4 MB block size for my metadata only system pool? As Stephen Ulmer said in his response ? ("Why the desire for a 1MB block size for metadata? It is RAID1 so no re-write penalty or need to hit a stripe size. Are you just trying to save the memory? If you had a 4MB block size, an 8KB sub-block size and things were 4K-aligned, you would always read 2 4K inodes,?) ? so if I?m using RAID 1 with 4K inodes then am I gaining anything by going with a smaller block size for metadata? So why was I choosing 1 MB in the first place? Well, I was planning on doing some experimenting with different block sizes for metadata to see if it made any difference. Historically, we had used a metadata block size of 64K to match the hardware ?stripe? size on the storage arrays (RAID 1 mirrors of hard drives back in the day). Now our metadata is on SSDs so with our latest filesystem we used 1 MB for both data and metadata because of the 1/32nd sub-block thing in GPFS 4.x. Since GPFS 5 removes that restriction, I was going to do some experimenting, but if the correct answer is just ?if 4 MB is what?s best for your data, then use it for metadata too? then I don?t mind saving some time?. ;-) Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 4:01 PM, Sven Oehme > wrote: the only way to get max number of subblocks for a 5.0.x filesystem with the released code is to have metadata and data use the same blocksize. sven On Wed, Aug 1, 2018 at 11:52 AM Buterbaugh, Kevin L > wrote: All, Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB ? so does this go back to what Marc is saying that there?s really only one sub blocks per block parameter? If so, is there any way to get what I want as described below? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L > wrote: Hi Sven, OK ? but why? I mean, that?s not what the man page says. Where does that ?4 x? come from? And, most importantly ? that?s not what I want. I want a smaller block size for the system pool since it?s metadata only and on RAID 1 mirrors (HD?s on the test cluster but SSD?s on the production cluster). So ? side question ? is 1 MB OK there? But I want a 4 MB block size for data with an 8 KB sub block ? I want good performance for the sane people using our cluster without unduly punishing the ? ahem ? fine folks whose apps want to create a bazillion tiny files! So how do I do that? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:41 PM, Sven Oehme > wrote: the number of subblocks is derived by the smallest blocksize in any pool of a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in the metadata pool, but 4 x of that in the data pool if your data pool is 4M. sven On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop > wrote: Marc, Kevin, We'll be looking into this issue, since at least at a first glance, it does look odd. A 4MB block size should have resulted in an 8KB subblock size. I suspect that, somehow, the --metadata-block-size 1M may have resulted in 32768 Minimum fragment (subblock) size in bytes (other pools) but I do not yet understand how. The subblocks-per-full-block parameter is not supported with mmcrfs . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 "Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per- From: "Marc A Kaplan" > To: gpfsug main discussion list > Date: 08/01/2018 01:21 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C23d636037b234fbbf9e908d5f7f1fcd1%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687541066564165&sdata=Z1tfD%2BMI1piJAtaBXQ2y9MEGNNLqCyKgHHws2wHmiTo%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Aug 2 01:00:47 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 1 Aug 2018 20:00:47 -0400 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Aug 2 01:11:51 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 2 Aug 2018 00:11:51 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: , <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Aug 2 01:52:19 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 1 Aug 2018 20:52:19 -0400 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: <59D32F54-3A88-469D-9D44-CE12B675E95A@ulmer.org> > On Aug 1, 2018, at 8:11 PM, Andrew Beattie wrote: > [?] > > which is probably why 32k sub block was the default for so many years .... I may not be remembering correctly, but I thought the default block size was 256k, and the sub-block size was always fixed at 1/32nd of the block size ? which only yields 32k sub-blocks for a 1MB block size. I also think there used to be something special about a 16k block size? but I haven?t slept well in about a week, so I might just be losing it. -- Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Aug 2 02:10:10 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 2 Aug 2018 01:10:10 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: <59D32F54-3A88-469D-9D44-CE12B675E95A@ulmer.org> References: <59D32F54-3A88-469D-9D44-CE12B675E95A@ulmer.org>, <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Aug 2 09:44:02 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 2 Aug 2018 16:44:02 +0800 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: <59D32F54-3A88-469D-9D44-CE12B675E95A@ulmer.org><3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: In released GPFS, we only support one subblocks-per-fullblock in one file system, like Sven mentioned that the subblocks-per-fullblock is derived by the smallest block size of metadata and data pools, the smallest block size decides the subblocks-per-fullblock and subblock size of all pools. There's an enhancement plan to have pools with different block sizes and/or subblocks-per-fullblock. Thanks, Yuan, Zheng Cai From: "Andrew Beattie" To: gpfsug-discuss at spectrumscale.org Date: 2018/08/02 09:10 Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Stephen, Sorry your right, I had to go back and look up what we were doing for metadata. but we ended up with 1MB block for metadata and 8MB for data and a 32k subblock based on the 1MB metadata block size, effectively a 256k subblock for the Data Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Stephen Ulmer Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Date: Thu, Aug 2, 2018 11:00 AM On Aug 1, 2018, at 8:11 PM, Andrew Beattie wrote: [?] which is probably why 32k sub block was the default for so many years .... I may not be remembering correctly, but I thought the default block size was 256k, and the sub-block size was always fixed at 1/32nd of the block size ? which only yields 32k sub-blocks for a 1MB block size. I also think there used to be something special about a 16k block size? but I haven?t slept well in about a week, so I might just be losing it. -- Stephen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Aug 2 16:56:20 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 2 Aug 2018 11:56:20 -0400 Subject: [gpfsug-discuss] Sven Oehme now at DDN In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: https://www.linkedin.com/in/oehmes/ Apparently, Sven is now "Chief Research Officer at DDN" -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Aug 2 17:01:58 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 2 Aug 2018 16:01:58 +0000 Subject: [gpfsug-discuss] Sven Oehme now at DDN Message-ID: <4D2B1925-2C14-47F8-A1A5-8E4EBA211462@nuance.com> Yes, I heard about this last week - Best of luck and congratulations Sven! I?m sure he?ll be around many of the GPFS events on the future. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Thursday, August 2, 2018 at 10:56 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Sven Oehme now at DDN https://www.linkedin.com/in/oehmes/ Apparently, Sven is now "Chief Research Officer at DDN" -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Aug 2 21:31:39 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 2 Aug 2018 20:31:39 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: <1772373B-B371-46AF-A61F-1B310B6BC1A7@vanderbilt.edu> Hi All, Thanks for all the responses on this, although I have the sneaking suspicion that the most significant thing that is going to come out of this thread is the knowledge that Sven has left IBM for DDN. ;-) or :-( or :-O depending on your perspective. Anyway ? we have done some testing which has shown that a 4 MB block size is best for those workloads that use ?normal? sized files. However, we - like many similar institutions - support a mixed workload, so the 128K fragment size that comes with that is not optimal for the primarily biomedical type applications that literally create millions of very small files. That?s why we settled on 1 MB as a compromise. So we?re very eager to now test with GPFS 5, a 4 MB block size, and a 8K fragment size. I?m recreating my test cluster filesystem now with that config ? so 4 MB block size on the metadata only system pool, too. Thanks to all who took the time to respond to this thread. I hope it?s been beneficial to others as well? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 7:11 PM, Andrew Beattie > wrote: I too would second the comment about doing testing specific to your environment We recently deployed a number of ESS building blocks into a customer site that was specifically being used for a mixed HPC workload. We spent more than a week playing with different block sizes for both data and metadata trying to identify which variation would provide the best mix of both metadata performance and data performance. one thing we noticed very early on is that MDtest and IOR both respond very differently as you play with both block size and subblock size. What works for one use case may be a very poor option for another use case. Interestingly enough it turned out that the best overall option for our particular use case was an 8MB block size with 32k sub blocks -- as that gave us good Metadata performance and good sequential data performance which is probably why 32k sub block was the default for so many years .... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Marc A Kaplan" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Date: Thu, Aug 2, 2018 10:01 AM Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Aug 2 22:14:51 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 2 Aug 2018 21:14:51 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: <1772373B-B371-46AF-A61F-1B310B6BC1A7@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> <1772373B-B371-46AF-A61F-1B310B6BC1A7@vanderbilt.edu> Message-ID: OK, so hold on ? NOW what?s going on??? I deleted the filesystem ? went to lunch ? came back an hour later ? recreated the filesystem with a metadata block size of 4 MB ? and I STILL have a 1 MB block size in the system pool and the wrong fragment size in other pools? Kevin /root/gpfs root at testnsd1# mmdelfs gpfs5 All data on the following disks of gpfs5 will be destroyed: test21A3nsd test21A4nsd test21B3nsd test21B4nsd test23Ansd test23Bnsd test23Cnsd test24Ansd test24Bnsd test24Cnsd test25Ansd test25Bnsd test25Cnsd Completed deletion of file system /dev/gpfs5. mmdelfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root at testnsd1# mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 4M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 12 % complete on Thu Aug 2 13:16:26 2018 25 % complete on Thu Aug 2 13:16:31 2018 38 % complete on Thu Aug 2 13:16:36 2018 50 % complete on Thu Aug 2 13:16:41 2018 62 % complete on Thu Aug 2 13:16:46 2018 74 % complete on Thu Aug 2 13:16:52 2018 85 % complete on Thu Aug 2 13:16:57 2018 96 % complete on Thu Aug 2 13:17:02 2018 100 % complete on Thu Aug 2 13:17:03 2018 Creating Allocation Maps Creating Log Files 3 % complete on Thu Aug 2 13:17:09 2018 28 % complete on Thu Aug 2 13:17:15 2018 53 % complete on Thu Aug 2 13:17:20 2018 78 % complete on Thu Aug 2 13:17:26 2018 100 % complete on Thu Aug 2 13:17:27 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 98 % complete on Thu Aug 2 13:17:34 2018 100 % complete on Thu Aug 2 13:17:34 2018 Formatting Allocation Map for storage pool raid1 52 % complete on Thu Aug 2 13:17:39 2018 100 % complete on Thu Aug 2 13:17:43 2018 Formatting Allocation Map for storage pool raid6 24 % complete on Thu Aug 2 13:17:48 2018 50 % complete on Thu Aug 2 13:17:53 2018 74 % complete on Thu Aug 2 13:17:58 2018 99 % complete on Thu Aug 2 13:18:03 2018 100 % complete on Thu Aug 2 13:18:03 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root at testnsd1# mmlsfs gpfs5 flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Thu Aug 2 13:16:47 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority /root/gpfs root at testnsd1# ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 2, 2018, at 3:31 PM, Buterbaugh, Kevin L > wrote: Hi All, Thanks for all the responses on this, although I have the sneaking suspicion that the most significant thing that is going to come out of this thread is the knowledge that Sven has left IBM for DDN. ;-) or :-( or :-O depending on your perspective. Anyway ? we have done some testing which has shown that a 4 MB block size is best for those workloads that use ?normal? sized files. However, we - like many similar institutions - support a mixed workload, so the 128K fragment size that comes with that is not optimal for the primarily biomedical type applications that literally create millions of very small files. That?s why we settled on 1 MB as a compromise. So we?re very eager to now test with GPFS 5, a 4 MB block size, and a 8K fragment size. I?m recreating my test cluster filesystem now with that config ? so 4 MB block size on the metadata only system pool, too. Thanks to all who took the time to respond to this thread. I hope it?s been beneficial to others as well? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 7:11 PM, Andrew Beattie > wrote: I too would second the comment about doing testing specific to your environment We recently deployed a number of ESS building blocks into a customer site that was specifically being used for a mixed HPC workload. We spent more than a week playing with different block sizes for both data and metadata trying to identify which variation would provide the best mix of both metadata performance and data performance. one thing we noticed very early on is that MDtest and IOR both respond very differently as you play with both block size and subblock size. What works for one use case may be a very poor option for another use case. Interestingly enough it turned out that the best overall option for our particular use case was an 8MB block size with 32k sub blocks -- as that gave us good Metadata performance and good sequential data performance which is probably why 32k sub block was the default for so many years .... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Marc A Kaplan" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Date: Thu, Aug 2, 2018 10:01 AM Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=d1rBsXZEn1BlkmvHGKHvkk2%2FWmXAppqS5SbOQF0ZCrY%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Fri Aug 3 07:01:42 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 3 Aug 2018 06:01:42 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: Message-ID: Can u share your stanza file ? Von meinem iPhone gesendet > Am 02.08.2018 um 23:15 schrieb Buterbaugh, Kevin L : > > OK, so hold on ? NOW what?s going on??? I deleted the filesystem ? went to lunch ? came back an hour later ? recreated the filesystem with a metadata block size of 4 MB ? and I STILL have a 1 MB block size in the system pool and the wrong fragment size in other pools? > > Kevin > > /root/gpfs > root at testnsd1# mmdelfs gpfs5 > All data on the following disks of gpfs5 will be destroyed: > test21A3nsd > test21A4nsd > test21B3nsd > test21B4nsd > test23Ansd > test23Bnsd > test23Cnsd > test24Ansd > test24Bnsd > test24Cnsd > test25Ansd > test25Bnsd > test25Cnsd > Completed deletion of file system /dev/gpfs5. > mmdelfs: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > /root/gpfs > root at testnsd1# mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 4M > > The following disks of gpfs5 will be formatted on node testnsd3: > test21A3nsd: size 953609 MB > test21A4nsd: size 953609 MB > test21B3nsd: size 953609 MB > test21B4nsd: size 953609 MB > test23Ansd: size 15259744 MB > test23Bnsd: size 15259744 MB > test23Cnsd: size 1907468 MB > test24Ansd: size 15259744 MB > test24Bnsd: size 15259744 MB > test24Cnsd: size 1907468 MB > test25Ansd: size 15259744 MB > test25Bnsd: size 15259744 MB > test25Cnsd: size 1907468 MB > Formatting file system ... > Disks up to size 8.29 TB can be added to storage pool system. > Disks up to size 16.60 TB can be added to storage pool raid1. > Disks up to size 132.62 TB can be added to storage pool raid6. > Creating Inode File > 12 % complete on Thu Aug 2 13:16:26 2018 > 25 % complete on Thu Aug 2 13:16:31 2018 > 38 % complete on Thu Aug 2 13:16:36 2018 > 50 % complete on Thu Aug 2 13:16:41 2018 > 62 % complete on Thu Aug 2 13:16:46 2018 > 74 % complete on Thu Aug 2 13:16:52 2018 > 85 % complete on Thu Aug 2 13:16:57 2018 > 96 % complete on Thu Aug 2 13:17:02 2018 > 100 % complete on Thu Aug 2 13:17:03 2018 > Creating Allocation Maps > Creating Log Files > 3 % complete on Thu Aug 2 13:17:09 2018 > 28 % complete on Thu Aug 2 13:17:15 2018 > 53 % complete on Thu Aug 2 13:17:20 2018 > 78 % complete on Thu Aug 2 13:17:26 2018 > 100 % complete on Thu Aug 2 13:17:27 2018 > Clearing Inode Allocation Map > Clearing Block Allocation Map > Formatting Allocation Map for storage pool system > 98 % complete on Thu Aug 2 13:17:34 2018 > 100 % complete on Thu Aug 2 13:17:34 2018 > Formatting Allocation Map for storage pool raid1 > 52 % complete on Thu Aug 2 13:17:39 2018 > 100 % complete on Thu Aug 2 13:17:43 2018 > Formatting Allocation Map for storage pool raid6 > 24 % complete on Thu Aug 2 13:17:48 2018 > 50 % complete on Thu Aug 2 13:17:53 2018 > 74 % complete on Thu Aug 2 13:17:58 2018 > 99 % complete on Thu Aug 2 13:18:03 2018 > 100 % complete on Thu Aug 2 13:18:03 2018 > Completed creation of file system /dev/gpfs5. > mmcrfs: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > /root/gpfs > root at testnsd1# mmlsfs gpfs5 > flag value description > ------------------- ------------------------ ----------------------------------- > -f 8192 Minimum fragment (subblock) size in bytes (system pool) > 32768 Minimum fragment (subblock) size in bytes (other pools) > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 2 Default number of metadata replicas > -M 3 Maximum number of metadata replicas > -r 1 Default number of data replicas > -R 3 Maximum number of data replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in effect > -k all ACL semantics in effect > -n 32 Estimated number of nodes that will mount file system > -B 1048576 Block size (system pool) > 4194304 Block size (other pools) > -Q user;group;fileset Quotas accounting enabled > user;group;fileset Quotas enforced > none Default quotas enabled > --perfileset-quota No Per-fileset quota enforcement > --filesetdf No Fileset df enabled? > -V 19.01 (5.0.1.0) File system version > --create-time Thu Aug 2 13:16:47 2018 File system creation time > -z No Is DMAPI enabled? > -L 33554432 Logfile size > -E Yes Exact mtime mount option > -S relatime Suppress atime mount option > -K whenpossible Strict replica allocation option > --fastea Yes Fast external attributes enabled? > --encryption No Encryption enabled? > --inode-limit 101095424 Maximum number of inodes > --log-replicas 0 Number of log replicas > --is4KAligned Yes is4KAligned? > --rapid-repair Yes rapidRepair enabled? > --write-cache-threshold 0 HAWC Threshold (max 65536) > --subblocks-per-full-block 128 Number of subblocks per full block > -P system;raid1;raid6 Disk storage pools in file system > --file-audit-log No File Audit Logging enabled? > --maintenance-mode No Maintenance Mode enabled? > -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system > -A yes Automatic mount option > -o none Additional mount options > -T /gpfs5 Default mount point > --mount-priority 0 Mount priority > /root/gpfs > root at testnsd1# > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > >> On Aug 2, 2018, at 3:31 PM, Buterbaugh, Kevin L wrote: >> >> Hi All, >> >> Thanks for all the responses on this, although I have the sneaking suspicion that the most significant thing that is going to come out of this thread is the knowledge that Sven has left IBM for DDN. ;-) or :-( or :-O depending on your perspective. >> >> Anyway ? we have done some testing which has shown that a 4 MB block size is best for those workloads that use ?normal? sized files. However, we - like many similar institutions - support a mixed workload, so the 128K fragment size that comes with that is not optimal for the primarily biomedical type applications that literally create millions of very small files. That?s why we settled on 1 MB as a compromise. >> >> So we?re very eager to now test with GPFS 5, a 4 MB block size, and a 8K fragment size. I?m recreating my test cluster filesystem now with that config ? so 4 MB block size on the metadata only system pool, too. >> >> Thanks to all who took the time to respond to this thread. I hope it?s been beneficial to others as well? >> >> Kevin >> >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 >> >>> On Aug 1, 2018, at 7:11 PM, Andrew Beattie wrote: >>> >>> I too would second the comment about doing testing specific to your environment >>> >>> We recently deployed a number of ESS building blocks into a customer site that was specifically being used for a mixed HPC workload. >>> >>> We spent more than a week playing with different block sizes for both data and metadata trying to identify which variation would provide the best mix of both metadata performance and data performance. one thing we noticed very early on is that MDtest and IOR both respond very differently as you play with both block size and subblock size. What works for one use case may be a very poor option for another use case. >>> >>> Interestingly enough it turned out that the best overall option for our particular use case was an 8MB block size with 32k sub blocks -- as that gave us good Metadata performance and good sequential data performance >>> >>> which is probably why 32k sub block was the default for so many years .... >>> Andrew Beattie >>> Software Defined Storage - IT Specialist >>> Phone: 614-2133-7927 >>> E-mail: abeattie at au1.ibm.com >>> >>> >>> ----- Original message ----- >>> From: "Marc A Kaplan" >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> To: gpfsug main discussion list >>> Cc: >>> Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? >>> Date: Thu, Aug 2, 2018 10:01 AM >>> >>> Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. >>> >>> Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. >>> >>> Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". >>> >>> Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. >>> Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... >>> >>> OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0 >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=d1rBsXZEn1BlkmvHGKHvkk2%2FWmXAppqS5SbOQF0ZCrY%3D&reserved=0 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Fri Aug 3 07:53:31 2018 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Fri, 3 Aug 2018 08:53:31 +0200 Subject: [gpfsug-discuss] Sven, the man with the golden gun now at DDN Message-ID: FYI - Sven is on a TOP secret mission called "Skyfall"; with his spirit, super tech skills and know-how he will educate and convert all the poor Lustre souls which are fighting for the world leadership. The GPFS-Q-team in Poughkeepsie has prepared him a golden Walther PPK (9mm) with lot's of Scale v5. silver bullets. He was given a top secret make_all_kind_of_I/O faster debugger with auto tuning features. And off course he received a new car by Aston Martin with lot's of special features designed by POK. It has dual V20-cores, lots of RAM, a Mestor-transmission, twin-port RoCE turbochargers, AFM Rockets and LROC escape seats. Poughkeepsie is still in the process to hire a larger group of smart and good looking NMVeOF I/O girls; feel free to send your ideas and pictures. The list of selected "Sven Girls" with be published in a new section in the Scale FAQ. -frank- -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Aug 3 13:49:48 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 3 Aug 2018 12:49:48 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: Message-ID: <11A27CF3-7484-45A8-ACFB-82B1F772A99B@vanderbilt.edu> Hi All, Aargh - now I really do feel like an idiot! I had set up the stanza file over a week ago ? then had to work on production issues ? and completely forgot about setting the block size in the pool stanzas there. But at least we all now know that stanza files override command line arguments to mmcrfs. My apologies? Kevin On Aug 3, 2018, at 1:01 AM, Olaf Weiser > wrote: Can u share your stanza file ? Von meinem iPhone gesendet Am 02.08.2018 um 23:15 schrieb Buterbaugh, Kevin L >: OK, so hold on ? NOW what?s going on??? I deleted the filesystem ? went to lunch ? came back an hour later ? recreated the filesystem with a metadata block size of 4 MB ? and I STILL have a 1 MB block size in the system pool and the wrong fragment size in other pools? Kevin /root/gpfs root at testnsd1# mmdelfs gpfs5 All data on the following disks of gpfs5 will be destroyed: test21A3nsd test21A4nsd test21B3nsd test21B4nsd test23Ansd test23Bnsd test23Cnsd test24Ansd test24Bnsd test24Cnsd test25Ansd test25Bnsd test25Cnsd Completed deletion of file system /dev/gpfs5. mmdelfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root at testnsd1# mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 4M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 12 % complete on Thu Aug 2 13:16:26 2018 25 % complete on Thu Aug 2 13:16:31 2018 38 % complete on Thu Aug 2 13:16:36 2018 50 % complete on Thu Aug 2 13:16:41 2018 62 % complete on Thu Aug 2 13:16:46 2018 74 % complete on Thu Aug 2 13:16:52 2018 85 % complete on Thu Aug 2 13:16:57 2018 96 % complete on Thu Aug 2 13:17:02 2018 100 % complete on Thu Aug 2 13:17:03 2018 Creating Allocation Maps Creating Log Files 3 % complete on Thu Aug 2 13:17:09 2018 28 % complete on Thu Aug 2 13:17:15 2018 53 % complete on Thu Aug 2 13:17:20 2018 78 % complete on Thu Aug 2 13:17:26 2018 100 % complete on Thu Aug 2 13:17:27 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 98 % complete on Thu Aug 2 13:17:34 2018 100 % complete on Thu Aug 2 13:17:34 2018 Formatting Allocation Map for storage pool raid1 52 % complete on Thu Aug 2 13:17:39 2018 100 % complete on Thu Aug 2 13:17:43 2018 Formatting Allocation Map for storage pool raid6 24 % complete on Thu Aug 2 13:17:48 2018 50 % complete on Thu Aug 2 13:17:53 2018 74 % complete on Thu Aug 2 13:17:58 2018 99 % complete on Thu Aug 2 13:18:03 2018 100 % complete on Thu Aug 2 13:18:03 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root at testnsd1# mmlsfs gpfs5 flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Thu Aug 2 13:16:47 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority /root/gpfs root at testnsd1# ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 2, 2018, at 3:31 PM, Buterbaugh, Kevin L > wrote: Hi All, Thanks for all the responses on this, although I have the sneaking suspicion that the most significant thing that is going to come out of this thread is the knowledge that Sven has left IBM for DDN. ;-) or :-( or :-O depending on your perspective. Anyway ? we have done some testing which has shown that a 4 MB block size is best for those workloads that use ?normal? sized files. However, we - like many similar institutions - support a mixed workload, so the 128K fragment size that comes with that is not optimal for the primarily biomedical type applications that literally create millions of very small files. That?s why we settled on 1 MB as a compromise. So we?re very eager to now test with GPFS 5, a 4 MB block size, and a 8K fragment size. I?m recreating my test cluster filesystem now with that config ? so 4 MB block size on the metadata only system pool, too. Thanks to all who took the time to respond to this thread. I hope it?s been beneficial to others as well? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 7:11 PM, Andrew Beattie > wrote: I too would second the comment about doing testing specific to your environment We recently deployed a number of ESS building blocks into a customer site that was specifically being used for a mixed HPC workload. We spent more than a week playing with different block sizes for both data and metadata trying to identify which variation would provide the best mix of both metadata performance and data performance. one thing we noticed very early on is that MDtest and IOR both respond very differently as you play with both block size and subblock size. What works for one use case may be a very poor option for another use case. Interestingly enough it turned out that the best overall option for our particular use case was an 8MB block size with 32k sub blocks -- as that gave us good Metadata performance and good sequential data performance which is probably why 32k sub block was the default for so many years .... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Marc A Kaplan" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Date: Thu, Aug 2, 2018 10:01 AM Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=d1rBsXZEn1BlkmvHGKHvkk2%2FWmXAppqS5SbOQF0ZCrY%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C89b5017f862b465a9ee908d5f9069a29%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688729119843837&sdata=0vjRu2TsZ5%2Bf84Sb7%2BTEdi8%2BmLGGpbqq%2FXNg2zfJRiw%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Aug 3 20:37:50 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 3 Aug 2018 12:37:50 -0700 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 Message-ID: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> All, Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: ? the draft agenda (bottom of page), ? a link to registration, register by September 1 due to ORNL site requirements (see next line) ? an important note about registration requirements for going to Oak Ridge National Lab ? a request for your site presentations ? information about HPCXXL and who to contact for information about joining, and ? other upcoming events. Hope you can attend and see Summit and Alpine first hand. Best, Kristy Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. About HPCXXL: HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. Other upcoming GPFS/SS events: Sep 19+20 HPCXXL, Oak Ridge Aug 10 Meetup along TechU, Sydney Oct 24 NYC User Meeting, New York Nov 11 SC, Dallas Dec 12 CIUK, Manchester Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ Duration Start End Title Wednesday 19th, 2018 Speaker TBD Chris Maestas (IBM) TBD (IBM) TBD (IBM) John Lewars (IBM) *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) John Lewars (IBM) Carl Zetie (IBM) TBD TBD (ORNL) TBD (IBM) William Godoy (ORNL) Ted Hoover (IBM) Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 13:15 Welcome 13:45 What is new in Spectrum Scale? 14:00 What is new in ESS? 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === 15:40 AWE 16:00 CSCS site report 16:20 Starfish (Sponsor talk) 16:50 Network Flow 17:20 RFEs 17:30 W rap-up Thursday 19th, 2018 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 08:50 Alpine ? the Summit file system 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library 10:00 AI Reference Architecture 10:30 === BREAK === 11:00 Encryption on the wire and on rest 11:30 Service Update 12:00 Open Forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Aug 6 19:34:34 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 6 Aug 2018 18:34:34 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: Message-ID: <60B6991C-8021-470E-BD71-B4885C726957@vanderbilt.edu> Hi All, So I was just reading the GPFS 5.0.0 Administration Guide (yes, I actually do look at the documentation even if it seems sometimes that I don?t!) for some other information and happened to come across this at the bottom of page 358: The --metadata-block-size flag on the mmcrfs command can be used to create a system pool with a different block size from the user pools. This can be especially beneficial if the default block size is larger than 1 MB. If data and metadata block sizes differ, the system pool must contain only metadataOnly disks. Given that one of the responses I received during this e-mail thread was from an IBM engineer basically pointing out that there is no benefit in setting the metadata-block-size to less than 4 MB if that?s what I want for the filesystem block size, this might be a candidate for a documentation update. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hnguyen at cray.com Mon Aug 6 20:52:28 2018 From: hnguyen at cray.com (Hoang Nguyen) Date: Mon, 6 Aug 2018 19:52:28 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: <60B6991C-8021-470E-BD71-B4885C726957@vanderbilt.edu> References: <60B6991C-8021-470E-BD71-B4885C726957@vanderbilt.edu> Message-ID: <7A96225E-B939-411F-B4C4-458DD4470B4D@cray.com> That comment in the Administration guide is a legacy comment when Metadata sub-block size was restricted to 1/32 of the Metadata block size. In the past, creating large Metadata block sizes also meant large sub-blocks and hence large directory blocks which wasted a lot of space. From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Monday, August 6, 2018 at 11:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Hi All, So I was just reading the GPFS 5.0.0 Administration Guide (yes, I actually do look at the documentation even if it seems sometimes that I don?t!) for some other information and happened to come across this at the bottom of page 358: The --metadata-block-size flag on the mmcrfs command can be used to create a system pool with a different block size from the user pools. This can be especially beneficial if the default block size is larger than 1 MB. If data and metadata block sizes differ, the system pool must contain only metadataOnly disks. Given that one of the responses I received during this e-mail thread was from an IBM engineer basically pointing out that there is no benefit in setting the metadata-block-size to less than 4 MB if that?s what I want for the filesystem block size, this might be a candidate for a documentation update. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Aug 6 22:42:54 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 6 Aug 2018 21:42:54 +0000 Subject: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: <735F4275-191A-4363-B98C-1EA289292037@vanderbilt.edu> Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From esperle at us.ibm.com Mon Aug 6 23:46:39 2018 From: esperle at us.ibm.com (Eric Sperley) Date: Mon, 6 Aug 2018 15:46:39 -0700 Subject: [gpfsug-discuss] mmaddcallback documentation issue In-Reply-To: <735F4275-191A-4363-B98C-1EA289292037@vanderbilt.edu> References: <735F4275-191A-4363-B98C-1EA289292037@vanderbilt.edu> Message-ID: See if this helps https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_mmaddcallback.htm Best Regards, Eric Eric Sperley, PhD SDI Architect To improve is to change; to be perfect is IBM Systems to change often - - Winston Churchill esperle at us.ibm.com +15033088721 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/06/2018 02:44 PM Subject: [gpfsug-discuss] mmaddcallback documentation issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A910265.gif Type: image/gif Size: 481 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A526482.gif Type: image/gif Size: 2322 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From peter.chase at metoffice.gov.uk Tue Aug 7 12:35:17 2018 From: peter.chase at metoffice.gov.uk (Chase, Peter) Date: Tue, 7 Aug 2018 11:35:17 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue Message-ID: Hi Kevin, I'm running policy migrations on Spectrum Scale 4.2.3, but I use mmapplypolicy to kick off the policy runs, not mmstartpolicy. Docs here (which I admit are not for your version of Spectrum Scale) state that mmstartpolicy is for internal GPFS use only: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Using+Policies So if the above link is correct, I'd recommend switching to using mmapplypolicy, which handily comes with a man page, whereas mmstartpolicy doesn't and might have you fumbling around in the dark. As for the issue you're experiencing with adding a callback, it looks like the mmaddcallback command is catching the --single-instance flag as an argument for it, not as a parameter for the mmstartpolicy command. After looking at the documentation you've referenced, I suspect that there's a typo/omission in the command and it should have a trailing double quote (") on the end of the parms argument list, i.e.: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance" I'm not sure how we go about asking IBM to correct their documentation, but expect someone in the user group will have some idea. Regards, Pete Chase peter.chase at metoffice.gov.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: 06 August 2018 23:47 To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 79, Issue 21 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. mmaddcallback documentation issue (Buterbaugh, Kevin L) 2. Re: mmaddcallback documentation issue (Eric Sperley) ---------------------------------------------------------------------- Message: 1 Date: Mon, 6 Aug 2018 21:42:54 +0000 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Subject: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: <735F4275-191A-4363-B98C-1EA289292037 at vanderbilt.edu> Content-Type: text/plain; charset="utf-8" Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Mon, 6 Aug 2018 15:46:39 -0700 From: "Eric Sperley" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: Content-Type: text/plain; charset="utf-8" See if this helps https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_mmaddcallback.htm Best Regards, Eric Eric Sperley, PhD SDI Architect To improve is to change; to be perfect is IBM Systems to change often - - Winston Churchill esperle at us.ibm.com +15033088721 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/06/2018 02:44 PM Subject: [gpfsug-discuss] mmaddcallback documentation issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A910265.gif Type: image/gif Size: 481 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A526482.gif Type: image/gif Size: 2322 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 79, Issue 21 ********************************************** From UWEFALKE at de.ibm.com Tue Aug 7 13:30:48 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 7 Aug 2018 14:30:48 +0200 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue In-Reply-To: References: Message-ID: "I'm not sure how we go about asking IBM to correct their documentation,..." One way would be to open a PMR, er?, case. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From Kevin.Buterbaugh at Vanderbilt.Edu Tue Aug 7 17:14:27 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 7 Aug 2018 16:14:27 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue In-Reply-To: References: Message-ID: <3F1F205C-B3EB-44CF-BC47-84FDF335FBEF@vanderbilt.edu> Hi All, I was able to navigate down thru IBM?s website and find the GPFS 5.0.1 manuals but they contain the same typo, which Pete has correctly identified ? and I have confirmed that his solution works. Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 7, 2018, at 6:35 AM, Chase, Peter > wrote: Hi Kevin, I'm running policy migrations on Spectrum Scale 4.2.3, but I use mmapplypolicy to kick off the policy runs, not mmstartpolicy. Docs here (which I admit are not for your version of Spectrum Scale) state that mmstartpolicy is for internal GPFS use only: https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fwikis%2Fhome%3Flang%3Den%23!%2Fwiki%2FGeneral%2BParallel%2BFile%2BSystem%2B(GPFS)%2Fpage%2FUsing%2BPolicies&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912985631&sdata=4PmYIvmKenhqtLRVhusaQpWHAjGcd6YFMkb5nMa%2Bwuw%3D&reserved=0 So if the above link is correct, I'd recommend switching to using mmapplypolicy, which handily comes with a man page, whereas mmstartpolicy doesn't and might have you fumbling around in the dark. As for the issue you're experiencing with adding a callback, it looks like the mmaddcallback command is catching the --single-instance flag as an argument for it, not as a parameter for the mmstartpolicy command. After looking at the documentation you've referenced, I suspect that there's a typo/omission in the command and it should have a trailing double quote (") on the end of the parms argument list, i.e.: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance" I'm not sure how we go about asking IBM to correct their documentation, but expect someone in the user group will have some idea. Regards, Pete Chase peter.chase at metoffice.gov.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: 06 August 2018 23:47 To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 79, Issue 21 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3D&reserved=0 or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. mmaddcallback documentation issue (Buterbaugh, Kevin L) 2. Re: mmaddcallback documentation issue (Eric Sperley) ---------------------------------------------------------------------- Message: 1 Date: Mon, 6 Aug 2018 21:42:54 +0000 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Subject: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: <735F4275-191A-4363-B98C-1EA289292037 at vanderbilt.edu> Content-Type: text/plain; charset="utf-8" Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Mon, 6 Aug 2018 15:46:39 -0700 From: "Eric Sperley" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: Content-Type: text/plain; charset="utf-8" See if this helps https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.1%2Fcom.ibm.spectrum.scale.v5r01.doc%2Fbl1adm_mmaddcallback.htm&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=WGASrQ8SqzMdkTkNRkeAEDoaACsnDZEAJF8G5GBIxsA%3D&reserved=0 Best Regards, Eric Eric Sperley, PhD SDI Architect To improve is to change; to be perfect is IBM Systems to change often - - Winston Churchill esperle at us.ibm.com +15033088721 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/06/2018 02:44 PM Subject: [gpfsug-discuss] mmaddcallback documentation issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A910265.gif Type: image/gif Size: 481 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A526482.gif Type: image/gif Size: 2322 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3D&reserved=0 End of gpfsug-discuss Digest, Vol 79, Issue 21 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Tue Aug 7 17:58:45 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Tue, 7 Aug 2018 16:58:45 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue In-Reply-To: References: Message-ID: >I'm not sure how we go about asking IBM to correct their documentation, but expect someone in the user group will have some idea. File an RFE against Scale and I will route it to the right place. Carl Zetie Offering Manager for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From carlz at us.ibm.com Wed Aug 8 13:24:52 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 8 Aug 2018 12:24:52 +0000 Subject: [gpfsug-discuss] Easy way to submit Documentation corrections and enhancements Message-ID: It turns out that there is an easier, faster way to submit corrections and enhancements to the Scale documentation than sending me an RFE. At the bottom of each page in the Knowledge Center, there is a Comments section. You just need to be signed in under your IBM ID to add a comment. And all of the comments are read and processed by our information design team. regards, Carl Zetie Offering Manager for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From ulmer at ulmer.org Thu Aug 9 05:46:12 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 9 Aug 2018 00:46:12 -0400 Subject: [gpfsug-discuss] Sven Oehme now at DDN In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: <151B98C8-4CF7-42DF-A328-0DAABAE067D0@ulmer.org> But it still shows him employed at IBM through ?present?. So is he on-loan or is it ?permanent?? -- Stephen > On Aug 2, 2018, at 11:56 AM, Marc A Kaplan wrote: > > https://www.linkedin.com/in/oehmes/ > Apparently, Sven is now "Chief Research Officer at DDN" > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Aug 9 06:07:53 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 9 Aug 2018 07:07:53 +0200 Subject: [gpfsug-discuss] Sven Oehme now at DDN In-Reply-To: <151B98C8-4CF7-42DF-A328-0DAABAE067D0@ulmer.org> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> <151B98C8-4CF7-42DF-A328-0DAABAE067D0@ulmer.org> Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Aug 9 14:18:40 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 9 Aug 2018 09:18:40 -0400 Subject: [gpfsug-discuss] Sven Oehme now at DDN In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu><151B98C8-4CF7-42DF-A328-0DAABAE067D0@ulmer.org> Message-ID: https://en.wikipedia.org/wiki/Coopetition -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Aug 9 20:11:27 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 9 Aug 2018 15:11:27 -0400 Subject: [gpfsug-discuss] logAssertFailed question Message-ID: <35653ad6-1184-d880-e7d6-0c55c87232f6@nasa.gov> Howdy All, We recently had a node running 4.2.3.6 (efix 9billion, sorry can't remember the exact efix) go wonky with a logAssertFailed error that looked similar to the description of this APAR fixed in 4.2.3.8: - Fix an assert in BufferDesc::flushBuffer Assert exp(!addrDirty || synchedStale || allDirty inode 554192 block 10 addrDirty 1 synchedStale 0 allDirty 0 that can happen during shutdown IJ04520 The odd thing is that APAR mentions the error can happen at shutdown and this node wasn't shutting down. In this APAR, can the error also occur when the node is not shutting down? Here's the head of the error we saw: Thu Aug 9 11:06:53.977 2018: [X] logAssertFailed: !addrDirty || synchedStale || allDirty Thu Aug 9 11:06:53.978 2018: [X] return code 0, reason code 0, log record tag 0 Thu Aug 9 11:06:57.557 2018: [X] *** Assert exp(!addrDirty || synchedStale || allDirty inode 96666844 snap 0 block 2034 bdP 0x1802F51DE40 addrDirty 1 synchedStale 0 allDirty 0 validBits 3x0-000000000003FFFF dirtyBits 3x0-000000000003FFFF ) in line 7316 of file /build/ode/ttn423ptf6/src/avs/fs/mmfs/ts/fs/bufdesc.C Thu Aug 9 11:06:57.558 2018: [E] *** Traceback: Thu Aug 9 11:06:57.559 2018: [E] 2:0x555555D6A016 logAssertFailed + 0x1B6 at ??:0 Thu Aug 9 11:06:57.560 2018: [E] 3:0x55555594B333 BufferDesc::flushBuffer(int, long long*) + 0x14A3 at ??:0 Thu Aug 9 11:06:57.561 2018: [E] 4:0x555555B483CE GlobalFS::LookForCleanToDo() + 0x2DE at ??:0 Thu Aug 9 11:06:57.562 2018: [E] 5:0x555555B48524 BufferCleanerBody(void*) + 0x74 at ??:0 Thu Aug 9 11:06:57.563 2018: [E] 6:0x555555868556 Thread::callBody(Thread*) + 0x46 at ??:0 Thu Aug 9 11:06:57.564 2018: [E] 7:0x555555855AF2 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 Thu Aug 9 11:06:57.565 2018: [E] 8:0x7FFFF79C5806 start_thread + 0xE6 at ??:0 Thu Aug 9 11:06:57.566 2018: [E] 9:0x7FFFF6B8567D clone + 0x6D at ??:0 mmfsd: /build/ode/ttn423ptf6/src/avs/fs/mmfs/ts/fs/bufdesc.C:7316: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion `!addrDirty || synchedStale || allDirty inode 96666844 snap 0 block 2034 bdP 0x1802F51DE40 addrDirty 1 synchedStale 0 allDirty 0 validBits 3x0-000000000003FFFF dirtyBits 3x0-000000000003FFFF ' failed. Thu Aug 9 11:06:57.586 2018: [E] Signal 6 at location 0x7FFFF6AD9875 in process 10775, link reg 0xFFFFFFFFFFFFFFFF. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From valdis.kletnieks at vt.edu Thu Aug 9 20:25:47 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 09 Aug 2018 15:25:47 -0400 Subject: [gpfsug-discuss] logAssertFailed question In-Reply-To: <35653ad6-1184-d880-e7d6-0c55c87232f6@nasa.gov> References: <35653ad6-1184-d880-e7d6-0c55c87232f6@nasa.gov> Message-ID: <29489.1533842747@turing-police.cc.vt.edu> On Thu, 09 Aug 2018 15:11:27 -0400, Aaron Knister said: > We recently had a node running 4.2.3.6 (efix 9billion, sorry can't > remember the exact efix) go wonky with a logAssertFailed error that > looked similar to the description of this APAR fixed in 4.2.3.8: > > - Fix an assert in BufferDesc::flushBuffer Assert exp(!addrDirty || > synchedStale || allDirty inode 554192 block 10 addrDirty 1 synchedStale > 0 allDirty 0 that can happen during shutdown IJ04520 Yep. *that* one. Saw it often enough to put a serious crimp in our style. 'logAssertFailed: ! addrDirty || synchedStale || allDirty' It's *totally* possible to hit it in the middle of a production workload. I don't think we ever saw it during shutdown. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Stephan.Peinkofer at lrz.de Fri Aug 10 12:29:18 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 10 Aug 2018 11:29:18 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Message-ID: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Fri Aug 10 13:51:56 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 10 Aug 2018 14:51:56 +0200 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: <298030c14ce94fae8f21aefe9d736b84@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Aug 10 14:02:33 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 10 Aug 2018 09:02:33 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Message-ID: Questions: How/why was the decision made to use a large number (~1000) of independent filesets ? What functions/features/commands are being used that work with independent filesets, that do not also work with "dependent" filesets? -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.lischewski at fz-juelich.de Fri Aug 10 15:25:17 2018 From: m.lischewski at fz-juelich.de (Martin Lischewski) Date: Fri, 10 Aug 2018 16:25:17 +0200 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Message-ID: Hello Olaf, hello Marc, we in J?lich are in the middle of migrating/copying all our old filesystems which were created with filesystem version: 13.23 (3.5.0.7) to new filesystems created with GPFS 5.0.1. We move to new filesystems mainly for two reasons: 1. We want to use the new increased number of subblocks. 2. We have to change our quota from normal "group-quota per filesystem" to "fileset-quota". The idea is to create a separate fileset for each group/project. For the users the quota-computation should be much more transparent. From now on all data which is stored inside of their directory (fileset) counts for their quota independent of the ownership. Right now we have round about 900 groups which means we will create round about 900 filesets per filesystem. In one filesystem we will have about 400million inodes (with rising tendency). This filesystem we will back up with "mmbackup" so we talked with Dominic Mueller-Wicke and he recommended us to use independent filesets. Because then the policy-runs can be parallelized and we can increase the backup performance. We belive that we require these parallelized policies run to meet our backup performance targets. But there are even more features we enable by using independet filesets. E.g. "Fileset level snapshots" and "user and group quotas inside of a fileset". I did not know about performance issues regarding independent filesets... Can you give us some more information about this? All in all we are strongly supporting the idea of increasing this limit. Do I understand correctly that by opening a PMR IBM allows to increase this limit on special sides? I would rather like to increase the limit and make it official public available and supported. Regards, Martin Am 10.08.2018 um 14:51 schrieb Olaf Weiser: > Hallo Stephan, > the limit is not a hard coded limit ?- technically spoken, you can > raise it easily. > But as always, it is a question of test 'n support .. > > I've seen customer cases, where the use of much smaller amount of > independent filesets generates a lot performance issues, hangs ... at > least noise and partial trouble .. > it might be not the case with your specific workload, because due to > the fact, that you 're running already ?close to 1000 ... > > I suspect , this number of 1000 file sets ?- at the time of > introducing it - was as also just that one had to pick a number... > > ... turns out.. that a general commitment to support > 1000 > ind.fileset is more or less hard.. because what uses cases should we > test / support > I think , there might be a good chance for you , that for your > specific workload, one would allow and support more than 1000 > > do you still have a PMR for your side for this ? ?- if not - I know .. > open PMRs is an additional ...but could you please .. > then we can decide .. if raising the limit is an option for you .. > > > > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 > E-Mail: olaf.weiser at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, > Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke , Uwe Tron > , Dorian Krause > Date: 08/10/2018 01:29 PM > Subject: [gpfsug-discuss] GPFS Independent Fileset Limit > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Dear IBM and GPFS List, > > we at the Leibniz Supercomputing Centre and our GCS Partners from the > J?lich Supercomputing Centre will soon be hitting the current > Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. > > There are also a number of RFEs from other users open, that target > this limitation: > _https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780_ > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534_ > __https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530_ > _https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282_ > > I know GPFS Development was very busy fulfilling the CORAL > requirements but maybe now there is again some time to improve > something else. > > If there are any other users on the list that are approaching the > current limitation in independent filesets, please take some time and > vote for the RFEs above. > > Many thanks in advance and have a nice weekend. > Best Regards, > Stephan Peinkofer > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5118 bytes Desc: S/MIME Cryptographic Signature URL: From Stephan.Peinkofer at lrz.de Fri Aug 10 16:14:46 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 10 Aug 2018 15:14:46 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> , Message-ID: Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): HPC WORK: Here every project has - for the lifetime of the project - a dedicated storage area that has some fileset quota attached to it, but no further per user or per group quotas are applied here. No backup is taken. Data Science Storage: This is for long term online and collaborative storage. Here projects can get so called "DSS Containers" to which they can give arbitrary users access to via a Self Service Interface (a little bit like Dropbox). Each of this DSS Containers is implemented via a independent fileset so that projects can also specify a per user quota for invited users, we can backup each container efficiently into a different TSM Node via mmbackup and we can run different actions using the mmapplypolicy to a DSS Container. Also we plan to offer our users to enable snapshots on their containers if they wish so. We currently deploy a 2PB file system for this and are in the process of bringing up two additional 10PB file systems for this but already have requests what it would mean if we have to scale this to 50PB. Data Science Archive (Planned): This is for long term archive storage. The usage model will be something similar to DSS but underlying, we plan to use TSM/HSM. Another point, but I don't remember it completely from the top of my head, where people might hit the limit is when they are using your OpenStack Manila integration. As It think your Manila driver creates an independent fileset for each network share in order to be able to provide the per share snapshot feature. So if someone is trying to use ISS in a bigger OS Cloud as Manila Storage the 1000er limit might hit them also. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Friday, August 10, 2018 3:02 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Questions: How/why was the decision made to use a large number (~1000) of independent filesets ? What functions/features/commands are being used that work with independent filesets, that do not also work with "dependent" filesets? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Peinkofer at lrz.de Fri Aug 10 16:39:50 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 10 Aug 2018 15:39:50 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, Message-ID: Dear Olaf, I know that this is "just" a "support" limit. However Sven some day on a UG meeting in Ehningen told me that there is more to this than just adjusting your QA qualification tests since the way it is implemented today does not really scale ;). That's probably the reason why you said you see sometimes problems when you are not even close to the limit. So if you look at the 250PB Alpine file system of Summit today, that is what's going to deployed at more than one site world wide in 2-4 years and imho independent filesets are a great way to make this large systems much more handy while still maintaining a unified namespace. So I really think it would be beneficial if the architectural limit that prevents scaling the number of independent filesets could be removed at all. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Olaf Weiser Sent: Friday, August 10, 2018 2:51 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Hallo Stephan, the limit is not a hard coded limit - technically spoken, you can raise it easily. But as always, it is a question of test 'n support .. I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. it might be not the case with your specific workload, because due to the fact, that you 're running already close to 1000 ... I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a number... ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what uses cases should we test / support I think , there might be a good chance for you , that for your specific workload, one would allow and support more than 1000 do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you please .. then we can decide .. if raising the limit is an option for you .. Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Peinkofer, Stephan" To: gpfsug main discussion list Cc: Doris Franke , Uwe Tron , Dorian Krause Date: 08/10/2018 01:29 PM Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 Sign up for an IBM account www.ibm.com IBM account registration https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Aug 10 16:51:28 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Aug 2018 15:51:28 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, Message-ID: <25008ae9da1649bb969592fdc0a5d6b5@jumptrading.com> This is definitely a great candidate for a RFE, if one does not already exist. Not to try and contradict by friend Olaf here, but I have been talking a lot with those internal to IBM, and the PMR process is for finding and correcting operational problems with the code level you are running, and closing out the PMR as quickly as possible. PMRs are not the vehicle for getting substantive changes and enhancements made to the product in general, which the RFE process is really the main way to do this. I just got off a call with Kristie and Carl about the RFE process and those on the list may know that we are working to improve this overall process. More will be sent out about this in the near future!! So I thought I would chime in on this discussion here to hopefully help us understand how important the RFE (admittedly currently got great) process really is and will be a great way to work together on these common goals and needs for the product we rely so heavily upon! Cheers!! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peinkofer, Stephan Sent: Friday, August 10, 2018 10:40 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ Dear Olaf, I know that this is "just" a "support" limit. However Sven some day on a UG meeting in Ehningen told me that there is more to this than just adjusting your QA qualification tests since the way it is implemented today does not really scale ;). That's probably the reason why you said you see sometimes problems when you are not even close to the limit. So if you look at the 250PB Alpine file system of Summit today, that is what's going to deployed at more than one site world wide in 2-4 years and imho independent filesets are a great way to make this large systems much more handy while still maintaining a unified namespace. So I really think it would be beneficial if the architectural limit that prevents scaling the number of independent filesets could be removed at all. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Olaf Weiser > Sent: Friday, August 10, 2018 2:51 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Hallo Stephan, the limit is not a hard coded limit - technically spoken, you can raise it easily. But as always, it is a question of test 'n support .. I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. it might be not the case with your specific workload, because due to the fact, that you 're running already close to 1000 ... I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a number... ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what uses cases should we test / support I think , there might be a good chance for you , that for your specific workload, one would allow and support more than 1000 do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you please .. then we can decide .. if raising the limit is an option for you .. Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke >, Uwe Tron >, Dorian Krause > Date: 08/10/2018 01:29 PM Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 Sign up for an IBM account www.ibm.com IBM account registration https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From djohnson at osc.edu Fri Aug 10 16:22:23 2018 From: djohnson at osc.edu (Doug Johnson) Date: Fri, 10 Aug 2018 11:22:23 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Message-ID: Hi all, I want to chime in because this is precisely what we have done at OSC due to the same motivations Janell described. Our design was based in part on the guidelines in the "Petascale Data Protection" white paper from IBM. We only have ~200 filesets and 250M inodes today, but expect to grow. We are also very interested in details about performance issues and independent filesets. Can IBM elaborate? Best, Doug Martin Lischewski writes: > Hello Olaf, hello Marc, > > we in J?lich are in the middle of migrating/copying all our old filesystems which were created with filesystem > version: 13.23 (3.5.0.7) to new filesystems created with GPFS 5.0.1. > > We move to new filesystems mainly for two reasons: 1. We want to use the new increased number of subblocks. > 2. We have to change our quota from normal "group-quota per filesystem" to "fileset-quota". > > The idea is to create a separate fileset for each group/project. For the users the quota-computation should be > much more transparent. From now on all data which is stored inside of their directory (fileset) counts for their > quota independent of the ownership. > > Right now we have round about 900 groups which means we will create round about 900 filesets per filesystem. > In one filesystem we will have about 400million inodes (with rising tendency). > > This filesystem we will back up with "mmbackup" so we talked with Dominic Mueller-Wicke and he recommended > us to use independent filesets. Because then the policy-runs can be parallelized and we can increase the backup > performance. We belive that we require these parallelized policies run to meet our backup performance targets. > > But there are even more features we enable by using independet filesets. E.g. "Fileset level snapshots" and "user > and group quotas inside of a fileset". > > I did not know about performance issues regarding independent filesets... Can you give us some more > information about this? > > All in all we are strongly supporting the idea of increasing this limit. > > Do I understand correctly that by opening a PMR IBM allows to increase this limit on special sides? I would rather > like to increase the limit and make it official public available and supported. > > Regards, > > Martin > > Am 10.08.2018 um 14:51 schrieb Olaf Weiser: > > Hallo Stephan, > the limit is not a hard coded limit - technically spoken, you can raise it easily. > But as always, it is a question of test 'n support .. > > I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot > performance issues, hangs ... at least noise and partial trouble .. > it might be not the case with your specific workload, because due to the fact, that you 're running already > close to 1000 ... > > I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a > number... > > ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what > uses cases should we test / support > I think , there might be a good chance for you , that for your specific workload, one would allow and support > more than 1000 > > do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you > please .. > then we can decide .. if raising the limit is an option for you .. > > Mit freundlichen Gr??en / Kind regards > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 > E-Mail: olaf.weiser at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo > Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE > 99369940 > > From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke , Uwe Tron , Dorian Krause > > Date: 08/10/2018 01:29 PM > Subject: [gpfsug-discuss] GPFS Independent Fileset Limit > Sent by: gpfsug-discuss-bounces at spectrumscale.org > --------------------------------------------------------------------------------------------------- > > Dear IBM and GPFS List, > > we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will > soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. > > There are also a number of RFEs from other users open, that target this limitation: > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 > > I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again > some time to improve something else. > > If there are any other users on the list that are approaching the current limitation in independent filesets, > please take some time and vote for the RFEs above. > > Many thanks in advance and have a nice weekend. > Best Regards, > Stephan Peinkofer > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Fri Aug 10 17:01:17 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Aug 2018 16:01:17 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: <25008ae9da1649bb969592fdc0a5d6b5@jumptrading.com> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, <25008ae9da1649bb969592fdc0a5d6b5@jumptrading.com> Message-ID: <01780289b9e14e599f848f78b33998d8@jumptrading.com> Just as a follow up to my own note, Stephan, already provided a list of existing RFEs from which to vote through the IBM RFE site, cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Bryan Banister Sent: Friday, August 10, 2018 10:51 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ This is definitely a great candidate for a RFE, if one does not already exist. Not to try and contradict by friend Olaf here, but I have been talking a lot with those internal to IBM, and the PMR process is for finding and correcting operational problems with the code level you are running, and closing out the PMR as quickly as possible. PMRs are not the vehicle for getting substantive changes and enhancements made to the product in general, which the RFE process is really the main way to do this. I just got off a call with Kristie and Carl about the RFE process and those on the list may know that we are working to improve this overall process. More will be sent out about this in the near future!! So I thought I would chime in on this discussion here to hopefully help us understand how important the RFE (admittedly currently got great) process really is and will be a great way to work together on these common goals and needs for the product we rely so heavily upon! Cheers!! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Peinkofer, Stephan Sent: Friday, August 10, 2018 10:40 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ Dear Olaf, I know that this is "just" a "support" limit. However Sven some day on a UG meeting in Ehningen told me that there is more to this than just adjusting your QA qualification tests since the way it is implemented today does not really scale ;). That's probably the reason why you said you see sometimes problems when you are not even close to the limit. So if you look at the 250PB Alpine file system of Summit today, that is what's going to deployed at more than one site world wide in 2-4 years and imho independent filesets are a great way to make this large systems much more handy while still maintaining a unified namespace. So I really think it would be beneficial if the architectural limit that prevents scaling the number of independent filesets could be removed at all. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Olaf Weiser > Sent: Friday, August 10, 2018 2:51 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Hallo Stephan, the limit is not a hard coded limit - technically spoken, you can raise it easily. But as always, it is a question of test 'n support .. I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. it might be not the case with your specific workload, because due to the fact, that you 're running already close to 1000 ... I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a number... ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what uses cases should we test / support I think , there might be a good chance for you , that for your specific workload, one would allow and support more than 1000 do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you please .. then we can decide .. if raising the limit is an option for you .. Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke >, Uwe Tron >, Dorian Krause > Date: 08/10/2018 01:29 PM Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 Sign up for an IBM account www.ibm.com IBM account registration https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Aug 10 18:15:34 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 10 Aug 2018 13:15:34 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, Message-ID: I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): -------------- next part -------------- An HTML attachment was scrubbed... URL: From anobre at br.ibm.com Fri Aug 10 19:10:35 2018 From: anobre at br.ibm.com (Anderson Ferreira Nobre) Date: Fri, 10 Aug 2018 18:10:35 +0000 Subject: [gpfsug-discuss] Top files on GPFS filesystem Message-ID: An HTML attachment was scrubbed... URL: From jake.carroll at uq.edu.au Sat Aug 11 03:18:28 2018 From: jake.carroll at uq.edu.au (Jake Carroll) Date: Sat, 11 Aug 2018 02:18:28 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Message-ID: Just to chime in on this... We have experienced a lot of problems as a result of the independent fileset limitation @ 1000. We have a very large campus wide deployment that relies upon filesets for collection management of large (and small) scientific data outputs. Every human who uses our GPFS AFM fabric gets a "collection", which is an independent fileset. Some may say this was an unwise design choice - but it was deliberate and related to security, namespace and inode isolation. It is a considered decision. Just not considered _enough_ given the 1000 fileset limit ;). We've even had to go as far as re-organising entire filesystems (splitting things apart) to sacrifice performance (less spindles for the filesets on top of a filesystem) to work around it - and sometimes spill into entirely new arrays. I've had it explained to me by internal IBM staff *why* it is hard to fix the fileset limits - and it isn't as straightforward as people think - especially in our case where each fileset is an AFM cache/home relationship - but we desperately need a solution. We logged an RFE. Hopefully others do, also. The complexity has been explained to me by a very good colleague who has helped us a great deal inside IBM (name withheld to protect the innocent) as a knock on effect of the computational overhead and expense of things _associated_ with independent filesets, like recursing a snapshot tree. So - it really isn't as simple as things appear on the surface - but it doesn't mean we shouldn't try to fix it, I suppose! We'd love to see this improved, too - as it's currently making things difficult. Happy to collaborate and work together on this, as always. -jc ---------------------------------------------------------------------- Message: 1 Date: Fri, 10 Aug 2018 11:22:23 -0400 From: Doug Johnson Hi all, I want to chime in because this is precisely what we have done at OSC due to the same motivations Janell described. Our design was based in part on the guidelines in the "Petascale Data Protection" white paper from IBM. We only have ~200 filesets and 250M inodes today, but expect to grow. We are also very interested in details about performance issues and independent filesets. Can IBM elaborate? Best, Doug Martin Lischewski writes: > Hello Olaf, hello Marc, > > we in J?lich are in the middle of migrating/copying all our old > filesystems which were created with filesystem > version: 13.23 (3.5.0.7) to new filesystems created with GPFS 5.0.1. > > We move to new filesystems mainly for two reasons: 1. We want to use the new increased number of subblocks. > 2. We have to change our quota from normal "group-quota per filesystem" to "fileset-quota". > > The idea is to create a separate fileset for each group/project. For > the users the quota-computation should be much more transparent. From > now on all data which is stored inside of their directory (fileset) counts for their quota independent of the ownership. > > Right now we have round about 900 groups which means we will create round about 900 filesets per filesystem. > In one filesystem we will have about 400million inodes (with rising tendency). > > This filesystem we will back up with "mmbackup" so we talked with > Dominic Mueller-Wicke and he recommended us to use independent > filesets. Because then the policy-runs can be parallelized and we can increase the backup performance. We belive that we require these parallelized policies run to meet our backup performance targets. > > But there are even more features we enable by using independet > filesets. E.g. "Fileset level snapshots" and "user and group quotas inside of a fileset". > > I did not know about performance issues regarding independent > filesets... Can you give us some more information about this? > > All in all we are strongly supporting the idea of increasing this limit. > > Do I understand correctly that by opening a PMR IBM allows to increase > this limit on special sides? I would rather like to increase the limit and make it official public available and supported. > > Regards, > > Martin > > Am 10.08.2018 um 14:51 schrieb Olaf Weiser: > > Hallo Stephan, > the limit is not a hard coded limit - technically spoken, you can raise it easily. > But as always, it is a question of test 'n support .. > > I've seen customer cases, where the use of much smaller amount of > independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. > it might be not the case with your specific workload, because due to > the fact, that you 're running already close to 1000 ... > > I suspect , this number of 1000 file sets - at the time of > introducing it - was as also just that one had to pick a number... > > ... turns out.. that a general commitment to support > 1000 > ind.fileset is more or less hard.. because what uses cases should we > test / support I think , there might be a good chance for you , that > for your specific workload, one would allow and support more than > 1000 > > do you still have a PMR for your side for this ? - if not - I know .. > open PMRs is an additional ...but could you please .. > then we can decide .. if raising the limit is an option for you .. > > Mit freundlichen Gr??en / Kind regards > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ---------------------------------------------------------------------- > --------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 > E-Mail: olaf.weiser at de.ibm.com > > ---------------------------------------------------------------------- > --------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, > Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE > 99369940 > > From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke , Uwe Tron > , Dorian Krause > Date: 08/10/2018 01:29 PM > Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- > ----------------------------- > > Dear IBM and GPFS List, > > we at the Leibniz Supercomputing Centre and our GCS Partners from the > J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. > > There are also a number of RFEs from other users open, that target this limitation: > > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID= > 56780 > > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID= > 120534 > > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID= > 106530 > > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID= > 85282 > > I know GPFS Development was very busy fulfilling the CORAL > requirements but maybe now there is again some time to improve something else. > > If there are any other users on the list that are approaching the > current limitation in independent filesets, please take some time and vote for the RFEs above. > > Many thanks in advance and have a nice weekend. > Best Regards, > Stephan Peinkofer > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ Message: 2 Date: Fri, 10 Aug 2018 16:01:17 +0000 From: Bryan Banister To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Message-ID: <01780289b9e14e599f848f78b33998d8 at jumptrading.com> Content-Type: text/plain; charset="iso-8859-1" Just as a follow up to my own note, Stephan, already provided a list of existing RFEs from which to vote through the IBM RFE site, cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Bryan Banister Sent: Friday, August 10, 2018 10:51 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ This is definitely a great candidate for a RFE, if one does not already exist. Not to try and contradict by friend Olaf here, but I have been talking a lot with those internal to IBM, and the PMR process is for finding and correcting operational problems with the code level you are running, and closing out the PMR as quickly as possible. PMRs are not the vehicle for getting substantive changes and enhancements made to the product in general, which the RFE process is really the main way to do this. I just got off a call with Kristie and Carl about the RFE process and those on the list may know that we are working to improve this overall process. More will be sent out about this in the near future!! So I thought I would chime in on this discussion here to hopefully help us understand how important the RFE (admittedly currently got great) process really is and will be a great way to work together on these common goals and needs for the product we rely so heavily upon! Cheers!! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Peinkofer, Stephan Sent: Friday, August 10, 2018 10:40 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ Dear Olaf, I know that this is "just" a "support" limit. However Sven some day on a UG meeting in Ehningen told me that there is more to this than just adjusting your QA qualification tests since the way it is implemented today does not really scale ;). That's probably the reason why you said you see sometimes problems when you are not even close to the limit. So if you look at the 250PB Alpine file system of Summit today, that is what's going to deployed at more than one site world wide in 2-4 years and imho independent filesets are a great way to make this large systems much more handy while still maintaining a unified namespace. So I really think it would be beneficial if the architectural limit that prevents scaling the number of independent filesets could be removed at all. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Olaf Weiser > Sent: Friday, August 10, 2018 2:51 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Hallo Stephan, the limit is not a hard coded limit - technically spoken, you can raise it easily. But as always, it is a question of test 'n support .. I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. it might be not the case with your specific workload, because due to the fact, that you 're running already close to 1000 ... I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a number... ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what uses cases should we test / support I think , there might be a good chance for you , that for your specific workload, one would allow and support more than 1000 do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you please .. then we can decide .. if raising the limit is an option for you .. Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke >, Uwe Tron >, Dorian Krause > Date: 08/10/2018 01:29 PM Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 Sign up for an IBM account www.ibm.com IBM account registration https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potent ial ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potent ial ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 79, Issue 29 ********************************************** From Stephan.Peinkofer at lrz.de Sat Aug 11 08:03:13 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Sat, 11 Aug 2018 07:03:13 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, , Message-ID: <28219001a90040d489e7269aa20fc4ae@lrz.de> Dear Marc, so at least your documentation says: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1hlp_filesfilesets.htm >>> User group and user quotas can be tracked at the file system level or per independent fileset. But obviously as a customer I don't know if that "Really" depends on independence. Currently about 70% of our filesets in the Data Science Storage systems get backed up to ISP. But that number may change over time as it depends on the requirements of our projects. For them it is just selecting "Protect this DSS Container by ISP" in a Web form an our portal then automatically does all the provisioning of the ISP Node to one of our ISP servers, rolling out the new dsm config files to the backup workers and so on. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Friday, August 10, 2018 7:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Sun Aug 12 14:05:53 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 12 Aug 2018 09:05:53 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <28219001a90040d489e7269aa20fc4ae@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, , <28219001a90040d489e7269aa20fc4ae@lrz.de> Message-ID: That's interesting, I confess I never read that piece of documentation. What's also interesting, is that if you look at this doc for quotas: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_change_quota_anynum_users_onproject_basis_acrs_protocols.htm The word independent appears only once in a "Note": It is recommended to create an independent fileset for the project. AND if you look at the mmchfs or mmchcr command you see: --perfileset-quota Sets the scope of user and group quota limit checks to the individual fileset level, rather than to the entire file system. With no mention of "dependent" nor "independent"... From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/11/2018 03:03 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, so at least your documentation says: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1hlp_filesfilesets.htm >>> User group and user quotas can be tracked at the file system level or per independent fileset. But obviously as a customer I don't know if that "Really" depends on independence. Currently about 70% of our filesets in the Data Science Storage systems get backed up to ISP. But that number may change over time as it depends on the requirements of our projects. For them it is just selecting "Protect this DSS Container by ISP" in a Web form an our portal then automatically does all the provisioning of the ISP Node to one of our ISP servers, rolling out the new dsm config files to the backup workers and so on. Best Regards, Stephan Peinkofer From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Friday, August 10, 2018 7:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Aug 13 07:10:04 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 13 Aug 2018 08:10:04 +0200 Subject: [gpfsug-discuss] Top files on GPFS filesystem In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 5698 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 360 bytes Desc: not available URL: From Stephan.Peinkofer at lrz.de Mon Aug 13 08:26:00 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Mon, 13 Aug 2018 07:26:00 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> <28219001a90040d489e7269aa20fc4ae@lrz.de> Message-ID: <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> Dear Marc, OK, so let?s give it a try: [root at datdsst100 pr74qo]# mmlsfileset dsstestfs01 Filesets in file system 'dsstestfs01': Name Status Path root Linked /dss/dsstestfs01 ... quota_test_independent Linked /dss/dsstestfs01/quota_test_independent quota_test_dependent Linked /dss/dsstestfs01/quota_test_independent/quota_test_dependent [root at datdsst100 pr74qo]# mmsetquota dsstestfs01:quota_test_independent --user a2822bp --block 1G:1G --files 10:10 [root at datdsst100 pr74qo]# mmsetquota dsstestfs01:quota_test_dependent --user a2822bp --block 10G:10G --files 100:100 [root at datdsst100 pr74qo]# mmrepquota -u -v dsstestfs01:quota_test_independent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_independent USR 0 1048576 1048576 0 none | 0 10 10 0 none e root quota_test_independent USR 0 0 0 0 none | 1 0 0 0 none i [root at datdsst100 pr74qo]# mmrepquota -u -v dsstestfs01:quota_test_dependent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_dependent USR 0 10485760 10485760 0 none | 0 100 100 0 none e root quota_test_dependent USR 0 0 0 0 none | 1 0 0 0 none i Looks good ? [root at datdsst100 pr74qo]# cd /dss/dsstestfs01/quota_test_independent/quota_test_dependent/ [root at datdsst100 quota_test_dependent]# for foo in `seq 1 99`; do touch file${foo}; chown a2822bp:pr28fa file${foo}; done [root at datdsst100 quota_test_dependent]# mmrepquota -u -v dsstestfs01:quota_test_dependent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_dependent USR 0 10485760 10485760 0 none | 99 100 100 0 none e root quota_test_dependent USR 0 0 0 0 none | 1 0 0 0 none i [root at datdsst100 quota_test_dependent]# mmrepquota -u -v dsstestfs01:quota_test_independent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_independent USR 0 1048576 1048576 0 none | 0 10 10 0 none e root quota_test_independent USR 0 0 0 0 none | 1 0 0 0 none i So it seems that per fileset per user quota is really not depending on independence. But what is the documentation then meaning with: >>> User group and user quotas can be tracked at the file system level or per independent fileset. ??? However, there still remains the problem with mmbackup and mmapplypolicy ? And if you look at some of the RFEs, like the one from DESY, they want even more than 10k independent filesets ? Best Regards, Stephan Peinkofer -- Stephan Peinkofer Dipl. Inf. (FH), M. Sc. (TUM) Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen Tel: +49(0)89 35831-8715 Fax: +49(0)89 35831-9700 URL: http://www.lrz.de On 12. Aug 2018, at 15:05, Marc A Kaplan > wrote: That's interesting, I confess I never read that piece of documentation. What's also interesting, is that if you look at this doc for quotas: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_change_quota_anynum_users_onproject_basis_acrs_protocols.htm The word independent appears only once in a "Note": It is recommended to create an independent fileset for the project. AND if you look at the mmchfs or mmchcr command you see: --perfileset-quota Sets the scope of user and group quota limit checks to the individual fileset level, rather than to the entire file system. With no mention of "dependent" nor "independent"... From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Date: 08/11/2018 03:03 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear Marc, so at least your documentation says: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1hlp_filesfilesets.htm >>> User group and user quotas can be tracked at the file system level or per independent fileset. But obviously as a customer I don't know if that "Really" depends on independence. Currently about 70% of our filesets in the Data Science Storage systems get backed up to ISP. But that number may change over time as it depends on the requirements of our projects. For them it is just selecting "Protect this DSS Container by ISP" in a Web form an our portal then automatically does all the provisioning of the ISP Node to one of our ISP servers, rolling out the new dsm config files to the backup workers and so on. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Marc A Kaplan > Sent: Friday, August 10, 2018 7:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Aug 13 08:52:55 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 13 Aug 2018 09:52:55 +0200 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de> <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Aug 13 16:12:32 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 13 Aug 2018 11:12:32 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de> <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> Message-ID: If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... Like many things in life, sometimes compromises are necessary! From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/13/2018 03:26 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, OK, so let?s give it a try: [root at datdsst100 pr74qo]# mmlsfileset dsstestfs01 Filesets in file system 'dsstestfs01': Name Status Path root Linked /dss/dsstestfs01 ... quota_test_independent Linked /dss/dsstestfs01/quota_test_independent quota_test_dependent Linked /dss/dsstestfs01/quota_test_independent/quota_test_dependent [root at datdsst100 pr74qo]# mmsetquota dsstestfs01:quota_test_independent --user a2822bp --block 1G:1G --files 10:10 [root at datdsst100 pr74qo]# mmsetquota dsstestfs01:quota_test_dependent --user a2822bp --block 10G:10G --files 100:100 [root at datdsst100 pr74qo]# mmrepquota -u -v dsstestfs01:quota_test_independent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_independent USR 0 1048576 1048576 0 none | 0 10 10 0 none e root quota_test_independent USR 0 0 0 0 none | 1 0 0 0 none i [root at datdsst100 pr74qo]# mmrepquota -u -v dsstestfs01:quota_test_dependent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_dependent USR 0 10485760 10485760 0 none | 0 100 100 0 none e root quota_test_dependent USR 0 0 0 0 none | 1 0 0 0 none i Looks good ? [root at datdsst100 pr74qo]# cd /dss/dsstestfs01/quota_test_independent/quota_test_dependent/ [root at datdsst100 quota_test_dependent]# for foo in `seq 1 99`; do touch file${foo}; chown a2822bp:pr28fa file${foo}; done [root at datdsst100 quota_test_dependent]# mmrepquota -u -v dsstestfs01:quota_test_dependent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_dependent USR 0 10485760 10485760 0 none | 99 100 100 0 none e root quota_test_dependent USR 0 0 0 0 none | 1 0 0 0 none i [root at datdsst100 quota_test_dependent]# mmrepquota -u -v dsstestfs01:quota_test_independent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_independent USR 0 1048576 1048576 0 none | 0 10 10 0 none e root quota_test_independent USR 0 0 0 0 none | 1 0 0 0 none i So it seems that per fileset per user quota is really not depending on independence. But what is the documentation then meaning with: >>> User group and user quotas can be tracked at the file system level or per independent fileset. ??? However, there still remains the problem with mmbackup and mmapplypolicy ? And if you look at some of the RFEs, like the one from DESY, they want even more than 10k independent filesets ? Best Regards, Stephan Peinkofer -- Stephan Peinkofer Dipl. Inf. (FH), M. Sc. (TUM) Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen Tel: +49(0)89 35831-8715 Fax: +49(0)89 35831-9700 URL: http://www.lrz.de On 12. Aug 2018, at 15:05, Marc A Kaplan wrote: That's interesting, I confess I never read that piece of documentation. What's also interesting, is that if you look at this doc for quotas: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_change_quota_anynum_users_onproject_basis_acrs_protocols.htm The word independent appears only once in a "Note": It is recommended to create an independent fileset for the project. AND if you look at the mmchfs or mmchcr command you see: --perfileset-quota Sets the scope of user and group quota limit checks to the individual fileset level, rather than to the entire file system. With no mention of "dependent" nor "independent"... From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/11/2018 03:03 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, so at least your documentation says: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1hlp_filesfilesets.htm >>> User group and user quotas can be tracked at the file system level or per independent fileset. But obviously as a customer I don't know if that "Really" depends on independence. Currently about 70% of our filesets in the Data Science Storage systems get backed up to ISP. But that number may change over time as it depends on the requirements of our projects. For them it is just selecting "Protect this DSS Container by ISP" in a Web form an our portal then automatically does all the provisioning of the ISP Node to one of our ISP servers, rolling out the new dsm config files to the backup workers and so on. Best Regards, Stephan Peinkofer From: gpfsug-discuss-bounces at spectrumscale.org < gpfsug-discuss-bounces at spectrumscale.org> on behalf of Marc A Kaplan < makaplan at us.ibm.com> Sent: Friday, August 10, 2018 7:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Aug 13 19:48:20 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]) Date: Mon, 13 Aug 2018 18:48:20 +0000 Subject: [gpfsug-discuss] TCP_QUICKACK Message-ID: <024BF8AB-B747-4EE3-82C9-A746190F99A5@nasa.gov> This is a question mostly for the devs. but really for anyone who can answer. Does GPFS use the TCP_QUICKACK socket flag on Linux? I?m debugging an IPoIB problem exacerbated by GPFS and based on the packet captures it seems as though the answer might be yes, but I?m curious if GPFS is explicitly doing this or if there?s just a timing window in the RPC behavior that just makes it look that way. -Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Aug 13 20:25:44 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 13 Aug 2018 15:25:44 -0400 Subject: [gpfsug-discuss] TCP_QUICKACK In-Reply-To: <024BF8AB-B747-4EE3-82C9-A746190F99A5@nasa.gov> References: <024BF8AB-B747-4EE3-82C9-A746190F99A5@nasa.gov> Message-ID: Hi Aaron, I just searched the core GPFS source code. I didn't find TCP_QUICKACK being used explicitly. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]" To: gpfsug main discussion list Date: 08/13/2018 02:48 PM Subject: [gpfsug-discuss] TCP_QUICKACK Sent by: gpfsug-discuss-bounces at spectrumscale.org This is a question mostly for the devs. but really for anyone who can answer. Does GPFS use the TCP_QUICKACK socket flag on Linux? I?m debugging an IPoIB problem exacerbated by GPFS and based on the packet captures it seems as though the answer might be yes, but I?m curious if GPFS is explicitly doing this or if there?s just a timing window in the RPC behavior that just makes it look that way. -Aaron _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kkr at lbl.gov Tue Aug 14 01:09:24 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 13 Aug 2018 17:09:24 -0700 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 In-Reply-To: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> References: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> Message-ID: <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov> All, don?t forget registration ends on the early side for this event due to background checks, etc. As noted below: IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Hope you?ll be able to attend! Best, Kristy > On Aug 3, 2018, at 12:37 PM, Kristy Kallback-Rose wrote: > > All, > > Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: > ? the draft agenda (bottom of page), > ? a link to registration, register by September 1 due to ORNL site requirements (see next line) > ? an important note about registration requirements for going to Oak Ridge National Lab > ? a request for your site presentations > ? information about HPCXXL and who to contact for information about joining, and > ? other upcoming events. > > Hope you can attend and see Summit and Alpine first hand. > > Best, > Kristy > > Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 > > IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. > > ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. > > About HPCXXL: > HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. > The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. > To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. > > Other upcoming GPFS/SS events: > Sep 19+20 HPCXXL, Oak Ridge > Aug 10 Meetup along TechU, Sydney > Oct 24 NYC User Meeting, New York > Nov 11 SC, Dallas > Dec 12 CIUK, Manchester > > > Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ > Duration Start End Title > > Wednesday 19th, 2018 > > Speaker > > TBD > Chris Maestas (IBM) TBD (IBM) > TBD (IBM) > John Lewars (IBM) > > *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) > John Lewars (IBM) > > Carl Zetie (IBM) TBD > > TBD (ORNL) > TBD (IBM) > William Godoy (ORNL) Ted Hoover (IBM) > > Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All > > 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 > > 13:15 Welcome > 13:45 What is new in Spectrum Scale? > 14:00 What is new in ESS? > 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === > 15:40 AWE > 16:00 CSCS site report > 16:20 Starfish (Sponsor talk) > 16:50 Network Flow > 17:20 RFEs > 17:30 W rap-up > > Thursday 19th, 2018 > > 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 > > 08:50 Alpine ? the Summit file system > 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library > 10:00 AI Reference Architecture > 10:30 === BREAK === > 11:00 Encryption on the wire and on rest 11:30 Service Update > 12:00 Open Forum > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Peinkofer at lrz.de Tue Aug 14 05:50:43 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Tue, 14 Aug 2018 04:50:43 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> <28219001a90040d489e7269aa20fc4ae@lrz.de> <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> Message-ID: <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Aug 14 07:08:55 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 14 Aug 2018 06:08:55 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> <28219001a90040d489e7269aa20fc4ae@lrz.de> <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> , <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> Message-ID: <4830FF9B-A443-4508-A8ED-B023B6EDD15C@huk-coburg.de> +1 great answer Stephan. We also dont understand why funktions are existend, but every time we want to use it, the first step is make a requirement. Von meinem iPhone gesendet Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Am 14.08.2018 um 06:50 schrieb Peinkofer, Stephan >: Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Aug 14 16:31:15 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 14 Aug 2018 11:31:15 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de><75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> Message-ID: True, mmbackup is designed to work best backing up either a single independent fileset or the entire file system. So if you know some filesets do not need to be backed up, map them to one or more indepedent filesets that will not be backed up. mmapplypolicy is happy to scan a single dependent fileset, use option --scope fileset and make the primary argument the path to the root of the fileset you wish to scan. The overhead is not simply described. The directory scan phase will explore or walk the (sub)tree in parallel with multiple threads on multiple nodes, reading just the directory blocks that need to be read. The inodescan phase will read blocks of inodes from the given inodespace ... since the inodes of dependent filesets may be "mixed" into the same blocks as other dependend filesets that are in the same independent fileset, mmapplypolicy will incur what you might consider "extra" overhead. From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/14/2018 12:50 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Aug 15 12:07:45 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 15 Aug 2018 11:07:45 +0000 Subject: [gpfsug-discuss] 5.0.1-2 release? Message-ID: Does anyone know if 5.0.1-2 is actually going to hit fix central? The release alert went out yesterday but the product is still not available. We've been waiting on it for a couple of weeks to fix an issue (we weren't offered an efix and were originally told it was due last week). Due to ongoing issues hitting some of our clients, we've had to take them out of service.... Related to fix central, is anyone else having issues with entitlement to download? We have DME licenses and can download standard edition 5.0, DME 4.2.3 but not DME 5.0.1... I got this fixed for my account, but others in my organisation/customer number don't seem to have access... Just wondering if this is just us, or others are having similar issues? Thanks Simon From r.sobey at imperial.ac.uk Wed Aug 15 13:56:28 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 15 Aug 2018 12:56:28 +0000 Subject: [gpfsug-discuss] 5.0.1 and HSM Message-ID: Hi all, Is anyone running HSM who has also upgraded to 5.0.1? I'd be interested to know if it work(s) or if you had to downgrade back to 5.0.0.X or even 4.2.3.X. Officially the website says not supported, but we've been told (not verbatim) there's no reason why it wouldn't. We really don't want to have to upgrade to a Scale 5 release that's already not receiving any more PTFs but we may have to. Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Aug 15 14:00:18 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 15 Aug 2018 13:00:18 +0000 Subject: [gpfsug-discuss] 5.0.1-2 release? In-Reply-To: References: Message-ID: Sorry, was able to download 5.0.1.1 DME just now, no issues. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson Sent: 15 August 2018 12:08 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] 5.0.1-2 release? Does anyone know if 5.0.1-2 is actually going to hit fix central? The release alert went out yesterday but the product is still not available. We've been waiting on it for a couple of weeks to fix an issue (we weren't offered an efix and were originally told it was due last week). Due to ongoing issues hitting some of our clients, we've had to take them out of service.... Related to fix central, is anyone else having issues with entitlement to download? We have DME licenses and can download standard edition 5.0, DME 4.2.3 but not DME 5.0.1... I got this fixed for my account, but others in my organisation/customer number don't seem to have access... Just wondering if this is just us, or others are having similar issues? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Robert.Oesterlin at nuance.com Wed Aug 15 19:37:50 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 15 Aug 2018 18:37:50 +0000 Subject: [gpfsug-discuss] 5.0.1-2 release? Message-ID: <65E22DAC-1FCE-424D-BE95-4C0D841194E1@nuance.com> 5.0.1.2 is now on Fix Central. Bob Oesterlin Sr Principal Storage Engineer, Nuance ?On 8/15/18, 6:07 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson" wrote: Does anyone know if 5.0.1-2 is actually going to hit fix central? The release alert went out yesterday but the product is still not available. We've been waiting on it for a couple of weeks to fix an issue (we weren't offered an efix and were originally told it was due last week). Due to ongoing issues hitting some of our clients, we've had to take them out of service.... Related to fix central, is anyone else having issues with entitlement to download? We have DME licenses and can download standard edition 5.0, DME 4.2.3 but not DME 5.0.1... I got this fixed for my account, but others in my organisation/customer number don't seem to have access... Just wondering if this is just us, or others are having similar issues? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=OYGVn5hlqVYT-aqb8EERr85EEm8p19iHHWkSpX7AeKc&s=91moEFA-0zhZicJFFWDd4iO2Wt7GhhuaDi6yvZqigrI&e= From carlz at us.ibm.com Thu Aug 16 13:28:22 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Thu, 16 Aug 2018 12:28:22 +0000 Subject: [gpfsug-discuss] Entitlements issues in Fix Central Message-ID: So... who wants to help us fix Fix Central? Two things: 1. I have seen a handful of issues in the last two weeks similar to what Simon and others have described: some versions of Scale download fine, others not. Some user IDs work, some get denied. And there is no obvious pattern or cause. We are looking at it, and more data points will help us track it down, so it would be a big help if everybody who encounters this reported it to Fix Central support: https://www.ibm.com/support/home/?lnk=fcw 2. An internal project is kicking off to improve Fix Central and Passport Advantage. If anybody would like to be a sponsor user in that project, contact me off-list. I can't guarantee participation, but I would love to get a couple of Scale users into the process. thanks, Carl Zetie Offering Manager for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From Dwayne.Hart at med.mun.ca Thu Aug 16 13:35:54 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Thu, 16 Aug 2018 12:35:54 +0000 Subject: [gpfsug-discuss] Entitlements issues in Fix Central In-Reply-To: References: Message-ID: <81C9FEC6-6BCF-433B-BEDB-B32A9B1A63B0@med.mun.ca> Hi Carl, I have access to both Fix Central and Passport Advantage. I?d like to assist in anyway I can. Best, Dwayne ? Dwayne Hart | Systems Administrator IV CHIA, Faculty of Medicine Memorial University of Newfoundland 300 Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L Dobbin Building | 4M409 T 709 864 6631 > On Aug 16, 2018, at 9:58 AM, Carl Zetie wrote: > > > So... who wants to help us fix Fix Central? > > Two things: > > 1. I have seen a handful of issues in the last two weeks similar to what Simon and others have described: some versions of Scale download fine, others not. Some user IDs work, some get denied. And there is no obvious pattern or cause. We are looking at it, and more data points will help us track it down, so it would be a big help if everybody who encounters this reported it to Fix Central support: > > https://www.ibm.com/support/home/?lnk=fcw > > > 2. An internal project is kicking off to improve Fix Central and Passport Advantage. If anybody would like to be a sponsor user in that project, contact me off-list. I can't guarantee participation, but I would love to get a couple of Scale users into the process. > > thanks, > > > > > > > > > > > > > > Carl Zetie > Offering Manager for Spectrum Scale, IBM > ---- > (540) 882 9353 ][ Research Triangle Park > carlz at us.ibm.com > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Stephan.Peinkofer at lrz.de Fri Aug 17 12:39:54 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 17 Aug 2018 11:39:54 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de><75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de>, Message-ID: Dear Marc, well as I think I cannot simply "move" dependent filesets between independent ones and our customers must have the opportunity to change data protection policy for their Containers at any given time, I cannot map them to a "backed up" or "not backed up" independent fileset. So how much performance impact is lets say 1-10 exclude.dir directives per independent fileset? Many thanks in advance. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Tuesday, August 14, 2018 5:31 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? True, mmbackup is designed to work best backing up either a single independent fileset or the entire file system. So if you know some filesets do not need to be backed up, map them to one or more indepedent filesets that will not be backed up. mmapplypolicy is happy to scan a single dependent fileset, use option --scope fileset and make the primary argument the path to the root of the fileset you wish to scan. The overhead is not simply described. The directory scan phase will explore or walk the (sub)tree in parallel with multiple threads on multiple nodes, reading just the directory blocks that need to be read. The inodescan phase will read blocks of inodes from the given inodespace ... since the inodes of dependent filesets may be "mixed" into the same blocks as other dependend filesets that are in the same independent fileset, mmapplypolicy will incur what you might consider "extra" overhead. From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/14/2018 12:50 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Aug 17 12:59:56 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 17 Aug 2018 07:59:56 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de><75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de><65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de>, Message-ID: My idea, not completely thought out, is that before you hit the 1000 limit, you start putting new customers or projects into dependent filesets and define those new dependent filesets within either a lesser number of independent filesets expressly created to receive the new customers OR perhaps even lump them with already existing independent filesets that have matching backup requirements. I would NOT try to create backups for each dependent fileset. But stick with the supported facilities to manage backup per independent... Having said that, if you'd still like to do backup per dependent fileset -- then have at it -- but test, test and retest.... And measure performance... My GUESS is that IF you can hack mmbackup or similar to use mmapplypolicy /path-to-dependent-fileset --scope fileset .... instead of mmapplypolicy /path-to-independent-fileset --scope inodespace .... You'll be okay because the inodescan where you end up reading some extra inodes is probably a tiny fraction of all the other IO you'll be doing! BUT I don't think IBM is in a position to encourage you to hack mmbackup -- it's already very complicated! From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/17/2018 07:40 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, well as I think I cannot simply "move" dependent filesets between independent ones and our customers must have the opportunity to change data protection policy for their Containers at any given time, I cannot map them to a "backed up" or "not backed up" independent fileset. So how much performance impact is lets say 1-10 exclude.dir directives per independent fileset? Many thanks in advance. Best Regards, Stephan Peinkofer From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Tuesday, August 14, 2018 5:31 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? True, mmbackup is designed to work best backing up either a single independent fileset or the entire file system. So if you know some filesets do not need to be backed up, map them to one or more indepedent filesets that will not be backed up. mmapplypolicy is happy to scan a single dependent fileset, use option --scope fileset and make the primary argument the path to the root of the fileset you wish to scan. The overhead is not simply described. The directory scan phase will explore or walk the (sub)tree in parallel with multiple threads on multiple nodes, reading just the directory blocks that need to be read. The inodescan phase will read blocks of inodes from the given inodespace ... since the inodes of dependent filesets may be "mixed" into the same blocks as other dependend filesets that are in the same independent fileset, mmapplypolicy will incur what you might consider "extra" overhead. From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/14/2018 12:50 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sat Aug 18 03:34:30 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 17 Aug 2018 22:34:30 -0400 Subject: [gpfsug-discuss] TCP_QUICKACK In-Reply-To: References: <024BF8AB-B747-4EE3-82C9-A746190F99A5@nasa.gov> Message-ID: <3de256a6-c8f0-3e44-baf8-3f32fb0c4a06@nasa.gov> Thanks! Appreciate the quick answer. On 8/13/18 3:25 PM, IBM Spectrum Scale wrote: > Hi Aaron, > > I just searched the core GPFS source code. I didn't find TCP_QUICKACKbeing used explicitly. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]" ---08/13/2018 02:48:53 PM---This is a question mostly f"Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]" ---08/13/2018 02:48:53 PM---This is a question mostly for the devs. but really for anyone who can answer. Does GPFS use the TCP_ > > From: "Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]" > To: gpfsug main discussion list > Date: 08/13/2018 02:48 PM > Subject: [gpfsug-discuss] TCP_QUICKACK > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > This is a question mostly for the devs. but really for anyone who can answer. > > Does GPFS use the TCP_QUICKACK socket flag on Linux? > > I?m debugging an IPoIB problem exacerbated by GPFS and based on the packet captures it seems as though the answer might be yes, but I?m curious if GPFS is explicitly doing this or if there?s just a timing window in the RPC behavior that just makes it look that way. > > -Aaron > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From david_johnson at brown.edu Mon Aug 20 17:55:18 2018 From: david_johnson at brown.edu (David Johnson) Date: Mon, 20 Aug 2018 12:55:18 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P Message-ID: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> I have one storage pool that was recently doubled, and another pool migrated there using mmapplypolicy. The new half is only 50% full, and the old half is 94% full. Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) d05_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.93G ( 0%) d04_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.9G ( 0%) d03_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.12G ( 0%) d02_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.03G ( 0%) d01_george_23 50.49T 23 No Yes 25.9T ( 51%) 18.92G ( 0%) d00_george_23 50.49T 23 No Yes 25.91T ( 51%) 19.05G ( 0%) d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.35G ( 0%) d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.2G ( 0%) d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) 69.93G ( 0%) d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.11G ( 0%) d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.08G ( 0%) d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.3G ( 0%) d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.25G ( 0%) d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.28G ( 0%) ------------- -------------------- ------------------- (pool total) 706.9T 180.1T ( 25%) 675.5G ( 0%) Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks from the _cit_ NSDs to the _george_ NSDs, so that they end up all around 75% full? Thanks, ? ddj Dave Johnson Brown University CCV/CIS -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Aug 20 19:02:05 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 20 Aug 2018 14:02:05 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: That should do what you want. Be aware that mmrestripefs generates significant IO load so you should either use the QoS feature to mitigate its impact or run the command when the system is not very busy. Note you have two additional NSDs in the 33 failure group than you do in the 23 failure group. You may want to change one of those NSDs in failure group 33 to be in failure group 23 so you have equal storage space in both failure groups. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: David Johnson To: gpfsug main discussion list Date: 08/20/2018 12:55 PM Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P Sent by: gpfsug-discuss-bounces at spectrumscale.org I have one storage pool that was recently doubled, and another pool migrated there using mmapplypolicy. The new half is only 50% full, and the old half is 94% full. Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) d05_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.93G ( 0%) d04_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.9G ( 0%) d03_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.12G ( 0%) d02_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.03G ( 0%) d01_george_23 50.49T 23 No Yes 25.9T ( 51%) 18.92G ( 0%) d00_george_23 50.49T 23 No Yes 25.91T ( 51%) 19.05G ( 0%) d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.35G ( 0%) d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.2G ( 0%) d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) 69.93G ( 0%) d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.11G ( 0%) d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.08G ( 0%) d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.3G ( 0%) d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.25G ( 0%) d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.28G ( 0%) ------------- -------------------- ------------------- (pool total) 706.9T 180.1T ( 25%) 675.5G ( 0%) Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks from the _cit_ NSDs to the _george_ NSDs, so that they end up all around 75% full? Thanks, ? ddj Dave Johnson Brown University CCV/CIS_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Mon Aug 20 19:06:23 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Mon, 20 Aug 2018 14:06:23 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: Does anyone have a good rule of thumb for iops to allow for background QOS tasks? -- ddj Dave Johnson > On Aug 20, 2018, at 2:02 PM, Frederick Stock wrote: > > That should do what you want. Be aware that mmrestripefs generates significant IO load so you should either use the QoS feature to mitigate its impact or run the command when the system is not very busy. > > Note you have two additional NSDs in the 33 failure group than you do in the 23 failure group. You may want to change one of those NSDs in failure group 33 to be in failure group 23 so you have equal storage space in both failure groups. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > From: David Johnson > To: gpfsug main discussion list > Date: 08/20/2018 12:55 PM > Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I have one storage pool that was recently doubled, and another pool migrated there using mmapplypolicy. > The new half is only 50% full, and the old half is 94% full. > > Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) > d05_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.93G ( 0%) > d04_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.9G ( 0%) > d03_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.12G ( 0%) > d02_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.03G ( 0%) > d01_george_23 50.49T 23 No Yes 25.9T ( 51%) 18.92G ( 0%) > d00_george_23 50.49T 23 No Yes 25.91T ( 51%) 19.05G ( 0%) > d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.35G ( 0%) > d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.2G ( 0%) > d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) 69.93G ( 0%) > d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.11G ( 0%) > d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.08G ( 0%) > d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.3G ( 0%) > d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.25G ( 0%) > d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.28G ( 0%) > ------------- -------------------- ------------------- > (pool total) 706.9T 180.1T ( 25%) 675.5G ( 0%) > > Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks from the _cit_ NSDs to the _george_ NSDs, > so that they end up all around 75% full? > > Thanks, > ? ddj > Dave Johnson > Brown University CCV/CIS_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Mon Aug 20 19:13:51 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Mon, 20 Aug 2018 11:13:51 -0700 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: Hey Dave, Can you say more about what you are trying to accomplish by doing the rebalance? IME, the performance hit from running the rebalance was higher than the performance hit from writes being directed to a subset of the disks. If you have any churn of the data, eventually they will rebalance anyway. Regards, Alex On Mon, Aug 20, 2018 at 11:06 AM wrote: > Does anyone have a good rule of thumb for iops to allow for background QOS > tasks? > > > > -- ddj > Dave Johnson > > On Aug 20, 2018, at 2:02 PM, Frederick Stock wrote: > > That should do what you want. Be aware that mmrestripefs generates > significant IO load so you should either use the QoS feature to mitigate > its impact or run the command when the system is not very busy. > > Note you have two additional NSDs in the 33 failure group than you do in > the 23 failure group. You may want to change one of those NSDs in failure > group 33 to be in failure group 23 so you have equal storage space in both > failure groups. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > From: David Johnson > To: gpfsug main discussion list > Date: 08/20/2018 12:55 PM > Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > I have one storage pool that was recently doubled, and another pool > migrated there using mmapplypolicy. > The new half is only 50% full, and the old half is 94% full. > > Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) > d05_george_23 50.49T 23 No Yes 25.91T ( 51%) > 18.93G ( 0%) > d04_george_23 50.49T 23 No Yes 25.91T ( 51%) > 18.9G ( 0%) > d03_george_23 50.49T 23 No Yes 25.9T ( 51%) > 19.12G ( 0%) > d02_george_23 50.49T 23 No Yes 25.9T ( 51%) > 19.03G ( 0%) > d01_george_23 50.49T 23 No Yes 25.9T ( 51%) > 18.92G ( 0%) > d00_george_23 50.49T 23 No Yes 25.91T ( 51%) > 19.05G ( 0%) > d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) > 70.35G ( 0%) > d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) > 70.2G ( 0%) > d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) > 69.93G ( 0%) > d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) > 70.11G ( 0%) > d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) > 70.08G ( 0%) > d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) > 70.3G ( 0%) > d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) > 70.25G ( 0%) > d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) > 70.28G ( 0%) > ------------- -------------------- > ------------------- > (pool total) 706.9T 180.1T ( 25%) > 675.5G ( 0%) > > Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks > from the _cit_ NSDs to the _george_ NSDs, > so that they end up all around 75% full? > > Thanks, > ? ddj > Dave Johnson > Brown University CCV/CIS_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Mon Aug 20 23:08:28 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 20 Aug 2018 18:08:28 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: <23047.1534802908@turing-police.cc.vt.edu> On Mon, 20 Aug 2018 14:02:05 -0400, "Frederick Stock" said: > Note you have two additional NSDs in the 33 failure group than you do in > the 23 failure group. You may want to change one of those NSDs in failure > group 33 to be in failure group 23 so you have equal storage space in both > failure groups. Keep in mind that the failure groups should be built up based on single points of failure. In other words, a failure group should consist of disks that will all stay up or all go down on the same failure (controller, network, whatever). Looking at the fact that you have 6 disks named 'dNN_george_33' and 8 named 'dNN_cit_33', it sounds very likely that they are in two different storage arrays, and you should make your failure groups so they don't span a storage array. In other words, taking a 'cit' disk and moving it into a 'george' failure group will Do The Wrong Thing, because if you do data replication, one copy can go onto a 'george' disk, and the other onto a 'cit' disk that's in the same array as the 'george' disk. If 'george' fails, you lose access to both replicas. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From david_johnson at brown.edu Mon Aug 20 23:21:08 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Mon, 20 Aug 2018 18:21:08 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: <23047.1534802908@turing-police.cc.vt.edu> References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> <23047.1534802908@turing-police.cc.vt.edu> Message-ID: Yes the arrays are in different buildings. We want to spread the activity over more servers if possible but recognize the extra load that rebalancing would entail. The system is busy all the time. I have considered using QOS when we run policy migrations but haven?t yet because I don?t know what value to allow for throttling IOPS. We need to do weekly migrations off of 15k rpm pool onto 7.2k rpm pool, and previously I?ve just let it run at native speed. I?d like to know what other folks have used for QOS settings. I think we may leave things alone for now regarding the original question, rebalancing this pool. -- ddj Dave Johnson > On Aug 20, 2018, at 6:08 PM, valdis.kletnieks at vt.edu wrote: > > On Mon, 20 Aug 2018 14:02:05 -0400, "Frederick Stock" said: > >> Note you have two additional NSDs in the 33 failure group than you do in >> the 23 failure group. You may want to change one of those NSDs in failure >> group 33 to be in failure group 23 so you have equal storage space in both >> failure groups. > > Keep in mind that the failure groups should be built up based on single points of failure. > In other words, a failure group should consist of disks that will all stay up or all go down on > the same failure (controller, network, whatever). > > Looking at the fact that you have 6 disks named 'dNN_george_33' and 8 named 'dNN_cit_33', > it sounds very likely that they are in two different storage arrays, and you should make your > failure groups so they don't span a storage array. In other words, taking a 'cit' disk > and moving it into a 'george' failure group will Do The Wrong Thing, because if you do > data replication, one copy can go onto a 'george' disk, and the other onto a 'cit' disk > that's in the same array as the 'george' disk. If 'george' fails, you lose access to both > replicas. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From aaron.s.knister at nasa.gov Tue Aug 21 01:05:07 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]) Date: Tue, 21 Aug 2018 00:05:07 +0000 Subject: [gpfsug-discuss] fcntl ENOTTY Message-ID: <2DAB9816-7DEE-4890-9045-489692D2BA6A@nasa.gov> Nothing worse than a vague question with little context, eh? Well... Does anyone know why GPFS might return ENOTTY to an fcntl(fd, F_SETLKW, &lock) where lock.l_type is set to F_RDLCK? The error prompting this question looks almost identical to the one in this (unfortunately unanswered) thread: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2014-June/000412.html -Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Tue Aug 21 04:28:19 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 20 Aug 2018 23:28:19 -0400 Subject: [gpfsug-discuss] fcntl ENOTTY In-Reply-To: <2DAB9816-7DEE-4890-9045-489692D2BA6A@nasa.gov> References: <2DAB9816-7DEE-4890-9045-489692D2BA6A@nasa.gov> Message-ID: <5e34373c-d6ff-fca7-4254-64958f636b69@nasa.gov> Argh... Please disregard (I think). Apparently, mpich uses "%X" to format errno (oh yeah, sure, why not use %p to print strings while we're at it) which means that the errno is *actually* 37 which is ENOLCK. Ok, now there's something I can work with. -Aaron p.s. I'm sure that formatting errno with %X made sense at the time (ok, no I'm not), but it sent me down a hell of a rabbit hole and I'm just bitter. No offense intended. On 8/20/18 8:05 PM, Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC] wrote: > Nothing worse than a vague question with little context, eh? Well... > > Does anyone know why GPFS might return ENOTTY to an fcntl(fd, F_SETLKW, &lock) where lock.l_type is set to F_RDLCK? > > The error prompting this question looks almost identical to the one in this (unfortunately unanswered) thread: > > http://www.spectrumscale.org/pipermail/gpfsug-discuss/2014-June/000412.html > > -Aaron > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From luis.bolinches at fi.ibm.com Tue Aug 21 05:11:24 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Tue, 21 Aug 2018 04:11:24 +0000 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: Message-ID: Hi You can enable QoS first to see the activity while on inf value to see the current values of usage and set the li is later on. Those limits are modificable online so even in case you have (not your case it seems) less activity times those can be increased for replication then and Lowe again on peak times. ? SENT FROM MOBILE DEVICE Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous > On 21 Aug 2018, at 1.21, david_johnson at brown.edu wrote: > > Yes the arrays are in different buildings. We want to spread the activity over more servers if possible but recognize the extra load that rebalancing would entail. The system is busy all the time. > > I have considered using QOS when we run policy migrations but haven?t yet because I don?t know what value to allow for throttling IOPS. We need to do weekly migrations off of 15k rpm pool onto 7.2k rpm pool, and previously I?ve just let it run at native speed. I?d like to know what other folks have used for QOS settings. > > I think we may leave things alone for now regarding the original question, rebalancing this pool. > > -- ddj > Dave Johnson > >> On Aug 20, 2018, at 6:08 PM, valdis.kletnieks at vt.edu wrote: >> >> On Mon, 20 Aug 2018 14:02:05 -0400, "Frederick Stock" said: >> >>> Note you have two additional NSDs in the 33 failure group than you do in >>> the 23 failure group. You may want to change one of those NSDs in failure >>> group 33 to be in failure group 23 so you have equal storage space in both >>> failure groups. >> >> Keep in mind that the failure groups should be built up based on single points of failure. >> In other words, a failure group should consist of disks that will all stay up or all go down on >> the same failure (controller, network, whatever). >> >> Looking at the fact that you have 6 disks named 'dNN_george_33' and 8 named 'dNN_cit_33', >> it sounds very likely that they are in two different storage arrays, and you should make your >> failure groups so they don't span a storage array. In other words, taking a 'cit' disk >> and moving it into a 'george' failure group will Do The Wrong Thing, because if you do >> data replication, one copy can go onto a 'george' disk, and the other onto a 'cit' disk >> that's in the same array as the 'george' disk. If 'george' fails, you lose access to both >> replicas. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Tue Aug 21 15:48:15 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Tue, 21 Aug 2018 14:48:15 +0000 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data In-Reply-To: References: <83A6EEB0EC738F459A39439733AE80452672ADC8@MBX114.d.ethz.ch>, Message-ID: <83A6EEB0EC738F459A39439733AE804526743F1B@MBX114.d.ethz.ch> More precisely the problem is the following: If I set period=1 for a "rate" sensor (network speed, NSD read/write speed, PDisk read/write speed) everything is correct because every second the sensors get the valuess of the cumulative counters (and do not divide it by 1, which is not affecting anything for 1 second). If I set the period=2, the "rate" sensors collect the values from the cumulative counters every two seconds but they do not divide by 2 those values (because pmsensors do not actually divide; they seem to silly report what they read which is understand-able from a performance point of view); then grafana receives as double as the real speed. I've to correct myself: here the point is not how sampling/downsampling is done by grafana/grafana-bridge/whatever as I wrongly wrote in my first email. The point is: if I collect data every N seconds (because I do not want to overloads the pmcollector node), how can I divide (in grafana) the reported collected data by N to get real avg speed in that N-seconds time interval ?? At the moment it seems that the only option is using N=1, which is bad because, as I stated, it overloads the collector when many nodes run many pmsensors... A ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of IBM Spectrum Scale [scale at us.ibm.com] Sent: Friday, July 27, 2018 8:27 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] How Zimon/Grafana-bridge process data Hi, as there are more often similar questions rising, we just put an article about the topic on the Spectrum Scale Wiki https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Downsampling%2C%20Upsampling%20and%20Aggregation%20of%20the%20performance%20data While there will be some minor updates on the article in the next time, it might already explain your questions. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. [Inactive hide details for "Dorigo Alvise (PSI)" ---13.07.2018 12:08:59---Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 s]"Dorigo Alvise (PSI)" ---13.07.2018 12:08:59---Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes. From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 13.07.2018 12:08 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes. I've the following perfmon configuration for the metric-group GPFSNSDDisk: { name = "GPFSNSDDisk" period = 2 restrict = "nsdNodes" }, that, as far as I know sends data to the collector every 2 seconds (correct ?). But how ? does it send what it reads from the counter every two seconds ? or does it aggregated in some way ? or what else ? In the collector node pmcollector, grafana-bridge and grafana-server run. Now I need to understand how to play with the grafana parameters: - Down sample (or Disable downsampling) - Aggregator (following on the same row the metrics). See attached picture 4s.png as reference. In the past I had the period set to 1. And grafana used to display correct data (bytes/s for the metric gpfs_nsdds_bytes_written) with aggregator set to "sum", which AFAIK means "sum all that metrics that match the filter below" (again see the attached picture to see how the filter is set to only collect data from the IO nodes). Today I've changed to "period=2"... and grafana started to display funny data rate (the double, or quad of the real rate). I had to play (almost randomly) with "Aggregator" (from sum to avg, which as fas as I undestand doesn't mean anything in my case... average between the two IO nodes ? or what ?) and "Down sample" (from empty to 2s, and then to 4s) to get back real data rate which is compliant with what I do get with dstat. Can someone kindly explain how to play with these parameters when zimon sensor's period is changed ? Many thanks in advance Regards, Alvise Dorigo[attachment "4s.png" deleted by Manfred Haubrich/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From makaplan at us.ibm.com Tue Aug 21 16:42:37 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 21 Aug 2018 11:42:37 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P; using QOS features In-Reply-To: References: Message-ID: (Aside from QOS, I second the notion to review your "failure groups" if you are using and depending on data replication.) For QOS, some suggestions: You might want to define a set of nodes that will do restripes using `mmcrnodeclass restripers -N ...` You can initially just enable `mmchqos FS --enable` and then monitor performance of your restripefs command `mmrestripefs FS -b -N restripers` that restricts operations to the restripers nodeclass. with `mmlsqos FS --seconds 60 [[see other options]]` Suppose you see an average iops rates of several thousand IOPs and you decide that is interfering with other work... Then, for example, you could "slow down" or "pace" mmrestripefs to use 999 iops within the system pool and 1999 iops within the data pool with: mmchqos FS --enable -N restripers pool=system,maintenance=999iops pool=data,maintenance=1999iops And monitor that with mmlsqos. Tip: For a more graphical view of QOS and disk performance, try samples/charts/qosplotfine.pl. You will need to have gnuplot working... If you are "into" performance tools you might want to look at the --fine-stats options of mmchqos and mmlsqos and plug that into your favorite performance viewer/plotter/analyzer tool(s). (Technical: mmlsqos --fine-stats is written to be used and digested by scripts, no so much for human "eyeballing". The --fine-stats argument of mmchqos is a number of seconds. The --fine-stats argument of mmlsqos is one or two index values. The doc for mmlsqos explains this and the qosplotfine.pl script is an example of how to use it. ) From: "Luis Bolinches" To: "gpfsug main discussion list" Date: 08/21/2018 12:56 AM Subject: Re: [gpfsug-discuss] Rebalancing with mmrestripefs -P Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi You can enable QoS first to see the activity while on inf value to see the current values of usage and set the li is later on. Those limits are modificable online so even in case you have (not your case it seems) less activity times those can be increased for replication then and Lowe again on peak times. ? SENT FROM MOBILE DEVICE Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous > On 21 Aug 2018, at 1.21, david_johnson at brown.edu wrote: > > Yes the arrays are in different buildings. We want to spread the activity over more servers if possible but recognize the extra load that rebalancing would entail. The system is busy all the time. > > I have considered using QOS when we run policy migrations but haven?t yet because I don?t know what value to allow for throttling IOPS. We need to do weekly migrations off of 15k rpm pool onto 7.2k rpm pool, and previously I?ve just let it run at native speed. I?d like to know what other folks have used for QOS settings. > > I think we may leave things alone for now regarding the original question, rebalancing this pool. > > -- ddj > Dave Johnson > >> On Aug 20, 2018, at 6:08 PM, valdis.kletnieks at vt.edu wrote: >> >> On Mon, 20 Aug 2018 14:02:05 -0400, "Frederick Stock" said: >> >>> Note you have two additional NSDs in the 33 failure group than you do in >>> the 23 failure group. You may want to change one of those NSDs in failure >>> group 33 to be in failure group 23 so you have equal storage space in both >>> failure groups. >> >> Keep in mind that the failure groups should be built up based on single points of failure. >> In other words, a failure group should consist of disks that will all stay up or all go down on >> the same failure (controller, network, whatever). >> >> Looking at the fact that you have 6 disks named 'dNN_george_33' and 8 named 'dNN_cit_33', >> it sounds very likely that they are in two different storage arrays, and you should make your >> failure groups so they don't span a storage array. In other words, taking a 'cit' disk >> and moving it into a 'george' failure group will Do The Wrong Thing, because if you do >> data replication, one copy can go onto a 'george' disk, and the other onto a 'cit' disk >> that's in the same array as the 'george' disk. If 'george' fails, you lose access to both >> replicas. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Richard.E.Powell at boeing.com Tue Aug 21 19:23:50 2018 From: Richard.E.Powell at boeing.com (Powell (US), Richard E) Date: Tue, 21 Aug 2018 18:23:50 +0000 Subject: [gpfsug-discuss] Problem using group pool for migration Message-ID: <7a0a914601594ccdb6c96504322de9c8@XCH15-09-11.nw.nos.boeing.com> Hi all, I'm trying to use the "GROUP POOL" feature for file migration with FILE_HEAT, similar to one of the ilm sample scripts. The problem I'm having is that it seems to be identifying the candidates correctly but, anytime I use the "group pool" name for the "to pool", it only selects the first candidate for migration. If I specify a single pool name for the "to pool", it selects multiple files as expected. Here are the policy rules I'm using: RULE 'gp' GROUP POOL 'gpool' is 'ssd' then 'disk1' RULE 'repack' MIGRATE FROM POOL 'gpool' TO POOL 'gpool' WEIGHT(FILE_HEAT) I'm not sure if I'm misunderstanding something or if this is a real bug. I'm just wondering if anyone else has run into this issue? I'm running 4.2.3.8 on RHEL 6. Thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Aug 21 20:45:10 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 21 Aug 2018 15:45:10 -0400 Subject: [gpfsug-discuss] Problem using group pool for migration In-Reply-To: <7a0a914601594ccdb6c96504322de9c8@XCH15-09-11.nw.nos.boeing.com> References: <7a0a914601594ccdb6c96504322de9c8@XCH15-09-11.nw.nos.boeing.com> Message-ID: Migrate to a group pool "repacks" the selected files over the pools that comprise the group IN THE ORDER SPECIFIED UP TO THE SPECIFIED LIMIT for each pool. To see this work, in your case, set a limit that is near the current occupancy of pool 'ssd'. For example: RULE ?gp? GROUP POOL ?gpool? is ?ssd? LIMIT(50) then ?disk1? Notice the documentation says the LIMIT defaults to 99. Also, if you've run the same policy before and nothings changed much, then of course, there's not going to be much "repacking" to be done, maybe not any. If the behaviour still doesn't make sense to you, try testing on a tiny file system with just a few small pools, sizing pools and files so that only a few files will fit in a pool... If you build such a test scenario and that still doesn't make sense, show us the example... ----------------------------------- From: "Powell (US), Richard E" Hi all, I?m trying to use the ?GROUP POOL? feature for file migration with FILE_HEAT, similar to one of the ilm sample scripts. The problem I?m having is that it seems to be identifying the candidates correctly but, anytime I use the ?group pool? name for the ?to pool?, it only selects the first candidate for migration. If I specify a single pool name for the ?to pool?, it selects multiple files as expected. Here are the policy rules I?m using: RULE ?gp? GROUP POOL ?gpool? is ?ssd? then ?disk1? RULE ?repack? MIGRATE FROM POOL ?gpool? TO POOL ?gpool? WEIGHT(FILE_HEAT) I?m not sure if I?m misunderstanding something or if this is a real bug. I?m just wondering if anyone else has run into this issue? I?m running 4.2.3.8 on RHEL 6. Thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Aug 21 21:11:10 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 21 Aug 2018 16:11:10 -0400 Subject: [gpfsug-discuss] Problem using group pool for migration In-Reply-To: References: <7a0a914601594ccdb6c96504322de9c8@XCH15-09-11.nw.nos.boeing.com> Message-ID: To repack in random order, which might be an interesting and easy way to test and demonstrate... Use the RAND() function: RULE ... MIGRATE ... WEIGHT(RAND()) ... -L 3 on the mmapplypolicy command will make the random weights evident in the output. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Aug 22 18:12:24 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 22 Aug 2018 17:12:24 +0000 Subject: [gpfsug-discuss] Those users.... Message-ID: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Sometimes, I look at the data that's being stored in my file systems and just shake my head: /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Wed Aug 22 19:17:01 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 22 Aug 2018 14:17:01 -0400 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: <422107E9-0AD1-49F8-99FD-D6713F90A844@ulmer.org> Clearly, those are the ones they?re working on. You?re lucky they?re de-duped. -- Stephen > On Aug 22, 2018, at 1:12 PM, Oesterlin, Robert wrote: > > Sometimes, I look at the data that's being stored in my file systems and just shake my head: > > /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From linesr at janelia.hhmi.org Wed Aug 22 19:54:22 2018 From: linesr at janelia.hhmi.org (Lines, Robert) Date: Wed, 22 Aug 2018 18:54:22 +0000 Subject: [gpfsug-discuss] Those users.... Message-ID: Make a better storage system and they will find a better way to abuse it. A PI during an annual talk to the facility: Because databases are hard and file systems have done a far better job of scaling we have implemented our datastore using files, file name and directory names. It handles the high concurrency far better than any database server we could have built for the amount we are charged for that same very tiny amount of data. Ignoring that the internal pricing for storage is based on sane usage and not packing your entire data set into small enough files that it all lives in the SSD tier. So I feel for you. Rob From: on behalf of "Oesterlin, Robert" Reply-To: gpfsug main discussion list Date: Wednesday, August 22, 2018 at 1:12 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] Those users.... Sometimes, I look at the data that's being stored in my file systems and just shake my head: /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bipcuds at gmail.com Wed Aug 22 20:32:56 2018 From: bipcuds at gmail.com (Keith Ball) Date: Wed, 22 Aug 2018 15:32:56 -0400 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Message-ID: Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. Many Thanks, Keith -------------- next part -------------- An HTML attachment was scrubbed... URL: From Richard.E.Powell at boeing.com Wed Aug 22 21:17:44 2018 From: Richard.E.Powell at boeing.com (Powell (US), Richard E) Date: Wed, 22 Aug 2018 20:17:44 +0000 Subject: [gpfsug-discuss] Problem using group pool for migration Message-ID: Allow me to elaborate on my question. The example I gave was trimmed-down to the minimum. I've been trying various combinations with different LIMIT values and different weight and where clauses, using '-I test' and '-I prepare' to see what it would do, but not actually doing the migration. The 'ssd' pool is about 36% utilized and I've been starting the mmapplypolicy scan at a sub-directory level where nearly all the files were in the disk pool. (You'll just have to trust me that the ssd pool can hold all of them :-)) If I specify 'ssd' as the "to pool", the output from the test or prepare options indicates that it would be able to migrate all of the candidate files to the ssd pool. But, if I specify the group pool as the "to pool", it is only willing to migrate the first candidate. That is with the ssd pool listed first in the group and with any limit as long as it's big enough to hold the current data plus the files I expected it to select, even the default of 99. I'm sure I'm either doing something wrong, or I *really* misunderstand the concept. It seems straight forward enough.... Thanks to everyone for your time! Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: Wednesday, August 22, 2018 4:00 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 79, Issue 47 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Problem using group pool for migration (Powell (US), Richard E) 2. Re: Problem using group pool for migration (Marc A Kaplan) 3. Re: Problem using group pool for migration (Marc A Kaplan) ---------------------------------------------------------------------- Message: 1 Date: Tue, 21 Aug 2018 18:23:50 +0000 From: "Powell (US), Richard E" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Problem using group pool for migration Message-ID: <7a0a914601594ccdb6c96504322de9c8 at XCH15-09-11.nw.nos.boeing.com> Content-Type: text/plain; charset="us-ascii" Hi all, I'm trying to use the "GROUP POOL" feature for file migration with FILE_HEAT, similar to one of the ilm sample scripts. The problem I'm having is that it seems to be identifying the candidates correctly but, anytime I use the "group pool" name for the "to pool", it only selects the first candidate for migration. If I specify a single pool name for the "to pool", it selects multiple files as expected. Here are the policy rules I'm using: RULE 'gp' GROUP POOL 'gpool' is 'ssd' then 'disk1' RULE 'repack' MIGRATE FROM POOL 'gpool' TO POOL 'gpool' WEIGHT(FILE_HEAT) I'm not sure if I'm misunderstanding something or if this is a real bug. I'm just wondering if anyone else has run into this issue? I'm running 4.2.3.8 on RHEL 6. Thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 21 Aug 2018 15:45:10 -0400 From: "Marc A Kaplan" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem using group pool for migration Message-ID: Content-Type: text/plain; charset="utf-8" Migrate to a group pool "repacks" the selected files over the pools that comprise the group IN THE ORDER SPECIFIED UP TO THE SPECIFIED LIMIT for each pool. To see this work, in your case, set a limit that is near the current occupancy of pool 'ssd'. For example: RULE ?gp? GROUP POOL ?gpool? is ?ssd? LIMIT(50) then ?disk1? Notice the documentation says the LIMIT defaults to 99. Also, if you've run the same policy before and nothings changed much, then of course, there's not going to be much "repacking" to be done, maybe not any. If the behaviour still doesn't make sense to you, try testing on a tiny file system with just a few small pools, sizing pools and files so that only a few files will fit in a pool... If you build such a test scenario and that still doesn't make sense, show us the example... ----------------------------------- From: "Powell (US), Richard E" Hi all, I?m trying to use the ?GROUP POOL? feature for file migration with FILE_HEAT, similar to one of the ilm sample scripts. The problem I?m having is that it seems to be identifying the candidates correctly but, anytime I use the ?group pool? name for the ?to pool?, it only selects the first candidate for migration. If I specify a single pool name for the ?to pool?, it selects multiple files as expected. Here are the policy rules I?m using: RULE ?gp? GROUP POOL ?gpool? is ?ssd? then ?disk1? RULE ?repack? MIGRATE FROM POOL ?gpool? TO POOL ?gpool? WEIGHT(FILE_HEAT) I?m not sure if I?m misunderstanding something or if this is a real bug. I?m just wondering if anyone else has run into this issue? I?m running 4.2.3.8 on RHEL 6. Thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Tue, 21 Aug 2018 16:11:10 -0400 From: "Marc A Kaplan" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem using group pool for migration Message-ID: Content-Type: text/plain; charset="us-ascii" To repack in random order, which might be an interesting and easy way to test and demonstrate... Use the RAND() function: RULE ... MIGRATE ... WEIGHT(RAND()) ... -L 3 on the mmapplypolicy command will make the random weights evident in the output. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 79, Issue 47 ********************************************** From valdis.kletnieks at vt.edu Wed Aug 22 21:35:57 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 22 Aug 2018 16:35:57 -0400 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: <168045.1534970157@turing-police.cc.vt.edu> On Wed, 22 Aug 2018 17:12:24 -0000, "Oesterlin, Robert" said: > Sometimes, I look at the data that's being stored in my file systems and just shake my head: > > /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) I've got 114,029 files of the form: /gpfs/archive/tenant/this/that/F:\the\other\thing\what\where\they\thinking/apparently/not/much.dat I admit being mystified - how does such a mess happen? (Note that our tenant users are only able to access the GPFS filesystem through NFS - which is only exported to other Linux systems....) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From jfosburg at mdanderson.org Wed Aug 22 21:44:29 2018 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Wed, 22 Aug 2018 20:44:29 +0000 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <168045.1534970157@turing-police.cc.vt.edu> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> <168045.1534970157@turing-police.cc.vt.edu> Message-ID: <42A96B62-CD95-458B-A702-F6ECFAC66AEF@mdanderson.org> A very, very long time ago we had an AIX system (4.3 with jfs1) where the users logged in interactively. We would find files with names like: /C:\some\very \non-posix\path/file There's a reason they're called lusers. ?On 8/22/18, 3:36 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu" wrote: On Wed, 22 Aug 2018 17:12:24 -0000, "Oesterlin, Robert" said: > Sometimes, I look at the data that's being stored in my file systems and just shake my head: > > /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) I've got 114,029 files of the form: /gpfs/archive/tenant/this/that/F:\the\other\thing\what\where\they\thinking/apparently/not/much.dat I admit being mystified - how does such a mess happen? (Note that our tenant users are only able to access the GPFS filesystem through NFS - which is only exported to other Linux systems....) The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. From jonathan.buzzard at strath.ac.uk Wed Aug 22 23:37:55 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 22 Aug 2018 23:37:55 +0100 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: On 22/08/18 18:12, Oesterlin, Robert wrote: > Sometimes, I look at the data that's being stored in my file systems and > just shake my head: > > /gpfs//Restricted/EventChangeLogs/deduped/working contains > 17,967,350 files (in ONE directory) > That's what inode quota's are for. Set it pretty high to begin with, say one million. That way the vast majority of users have no issues ever. Then the troublesome few will have issues at which point you can determine why they are storing so many files, and appropriately educate them on better ways to do it. Finally if they really need that many files just charge them for it :-) Having lots of files has a cost just like having lots of data has a cost, and it's not fair for the reasonable users to subsidize them. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Thu Aug 23 00:02:28 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 22 Aug 2018 23:02:28 +0000 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Thu Aug 23 01:59:11 2018 From: skylar2 at uw.edu (Skylar Thompson) Date: Wed, 22 Aug 2018 17:59:11 -0700 Subject: [gpfsug-discuss] Those users.... In-Reply-To: References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: <20180823005911.GA5982@almaren> On Wed, Aug 22, 2018 at 11:37:55PM +0100, Jonathan Buzzard wrote: > On 22/08/18 18:12, Oesterlin, Robert wrote: > >Sometimes, I look at the data that's being stored in my file systems and > >just shake my head: > > > >/gpfs//Restricted/EventChangeLogs/deduped/working contains > >17,967,350 files (in ONE directory) > > > > That's what inode quota's are for. Set it pretty high to begin with, say one > million. That way the vast majority of users have no issues ever. Then the > troublesome few will have issues at which point you can determine why they > are storing so many files, and appropriately educate them on better ways to > do it. Finally if they really need that many files just charge them for it > :-) Having lots of files has a cost just like having lots of data has a > cost, and it's not fair for the reasonable users to subsidize them. Yep, we set our fileset inode quota to 1 million/TB of allocated space. It seems overly generous to me but it's far better than no limit at all. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From rohwedder at de.ibm.com Thu Aug 23 09:51:39 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Thu, 23 Aug 2018 10:51:39 +0200 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI In-Reply-To: References: Message-ID: Hello Keith, it is not so easy. The GUI receives events from other scale components using the currently defined ports. Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES). Therefore at this point there is no procedure to change this behaviour across all components. Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080. Tweaking the ports in the server.xml file will only change the native ports that the GUI uses. The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. If these ports are already used by another service, the GUI will not start up. Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap. If you want to emphasize your case as future development item, please let me know. I would also be interested in: > Scale version you are running > Do you need port 80 or 443 as well? > Would it work for you if the xCAT service was bound to a single IP address? Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: Keith Ball To: gpfsug-discuss at spectrumscale.org Date: 22.08.2018 21:33 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: ??? ??????? ??? but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. Many Thanks, ? Keith _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 15917110.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Juri.Haberland at rohde-schwarz.com Thu Aug 23 10:24:38 2018 From: Juri.Haberland at rohde-schwarz.com (Juri Haberland) Date: Thu, 23 Aug 2018 09:24:38 +0000 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Message-ID: Hello Markus, I?m not sure how to interpret your answer: Do the internal processes connect to the non-privileged ports (47443 and 47080) or the privileged ports? If they use the privileged ports we would appreciate it if IBM could change that behavior to using the non-privileged ports so one could change the privileged ones or use a httpd server in front of the GUI web service. We are going to need this in the near future as well? Thanks & kind regards. Juri Haberland -- Juri Haberland R&D SW File Based Media Solutions | 7TF1 Rohde & Schwarz GmbH & Co. KG Hanomaghof 1 | 30449 Hannover Phone: +49 511 678 07 246 | Fax: +49 511 678 07 200 Internet: www.rohde-schwarz.com Gesch?ftsf?hrung / Executive Board: Christian Leicher (Vorsitzender / Chairman), Peter Riedel, Sitz der Gesellschaft / Company's Place of Business: M?nchen, Registereintrag / Commercial Register No.: HRA 16 270, Pers?nlich haftender Gesellschafter / Personally Liable Partner: RUSEG Verwaltungs-GmbH, Sitz der Gesellschaft / Company's Place of Business: M?nchen, Registereintrag / Commercial Register No.: HRB 7 534, Umsatzsteuer-Identifikationsnummer (USt-IdNr.) / VAT Identification No.: DE 130 256 683, Elektro-Altger?te Register (EAR) / WEEE Register No.: DE 240 437 86 From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Markus Rohwedder Sent: Thursday, August 23, 2018 10:52 AM To: gpfsug main discussion list Subject: *EXT* [Newsletter] Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Hello Keith, it is not so easy. The GUI receives events from other scale components using the currently defined ports. Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES). Therefore at this point there is no procedure to change this behaviour across all components. Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080. Tweaking the ports in the server.xml file will only change the native ports that the GUI uses. The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. If these ports are already used by another service, the GUI will not start up. Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap. If you want to emphasize your case as future development item, please let me know. I would also be interested in: > Scale version you are running > Do you need port 80 or 443 as well? > Would it work for you if the xCAT service was bound to a single IP address? Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development ________________________________ Phone: +49 7034 6430190 IBM Deutschland Research & Development [cid:image003.png at 01D43AD3.9FE459C0] E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ [Inactive hide details for Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the]Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? From: Keith Ball > To: gpfsug-discuss at spectrumscale.org Date: 22.08.2018 21:33 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. Many Thanks, Keith _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 166 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 4659 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.gif Type: image/gif Size: 105 bytes Desc: image004.gif URL: From daniel.kidger at uk.ibm.com Thu Aug 23 11:13:04 2018 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Thu, 23 Aug 2018 10:13:04 +0000 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=8FBB0861DFBF7B798f9e8a93df938690918c8FB at .gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.2__=8FBB0861DFBF7B798f9e8a93df938690918c8FB at .gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.3__=8FBB0861DFBF7B798f9e8a93df938690918c8FB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From rohwedder at de.ibm.com Thu Aug 23 12:50:32 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Thu, 23 Aug 2018 13:50:32 +0200 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI In-Reply-To: References: , Message-ID: Hello Juri, Keith, thank you for your responses. The internal services communicate on the privileged ports, for backwards compatibility and firewall simplicity reasons. We can not just assume all nodes in the cluster are at the latest level. Running two services at the same port on different IP addresses could be an option to consider for co-existance of the GUI and another service on the same node. However we have not set up, tested nor documented such a configuration as of today. Currently the GUI service manages the iptables redirect bring up and tear down. If this would be managed externally it would be possible to bind services to specific ports based on specific IPs. In order to create custom redirect rules based on IP address it is necessary to instruct the GUI to - not check for already used ports when the GUI service tries to start up - don't create/destroy port forwarding rules during GUI service start and stop. This GUI behavior can be configured using the internal flag UPDATE_IPTABLES in the service configuration with the 5.0.1.2 GUI code level. The service configuration is not stored in the cluster configuration and may be overwritten during code upgrades, so these settings may have to be added again after an upgrade. See this KC link: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_firewallforgui.htm Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: "Daniel Kidger" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 23.08.2018 12:13 Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org Keith, I have another IBM customer who also wished to move Scale GUI's https ports. In their case because they had their own web based management interface on the same https port. Is this the same reason that you have? If so I wonder how many other sites have the same issue? One workaround that was suggested at the time, was to add a second IP address to the node (piggy-backing on 'eth0'). Then run the two different GUIs, one per IP address. Is this an option, albeit a little ugly? Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-(0)7818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Markus Rohwedder" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Date: Thu, Aug 23, 2018 9:51 AM Hello Keith, it is not so easy. The GUI receives events from other scale components using the currently defined ports. Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES). Therefore at this point there is no procedure to change this behaviour across all components. Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080. Tweaking the ports in the server.xml file will only change the native ports that the GUI uses. The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. If these ports are already used by another service, the GUI will not start up. Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap. If you want to emphasize your case as future development item, please let me know. I would also be interested in: > Scale version you are running > Do you need port 80 or 443 as well? > Would it work for you if the xCAT service was bound to a single IP address? Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany Inactive hide details for Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? From: Keith Ball To: gpfsug-discuss at spectrumscale.org Date: 22.08.2018 21:33 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. Many Thanks, Keith _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 17153317.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 17310450.gif Type: image/gif Size: 60281 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Thu Aug 23 14:27:41 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 23 Aug 2018 13:27:41 +0000 Subject: [gpfsug-discuss] Call home Message-ID: <696B8436-17A4-4EEC-933E-7B1B0B13D498@bham.ac.uk> Hi, I?m just having a poke around with the callhome feature. If I use `mmcallhome group auto`, I can see that it creates a group. Now if I add a node to the cluster, how to I add that node to the same call home group that is already present? If I try for example: $ mmcallhome group add autoGroup_1 MYNEWSERVER --node all Failed to add this group: Group name "autoGroup_1" is already used ? Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Thu Aug 23 14:57:43 2018 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Thu, 23 Aug 2018 13:57:43 +0000 Subject: [gpfsug-discuss] Call home In-Reply-To: <696B8436-17A4-4EEC-933E-7B1B0B13D498@bham.ac.uk> Message-ID: Hi Simon, Just recreate the group using mmcallhome group auto command together with ?force option to overwrite the existing group. Sent from my iPhone using IBM Verse On 23. Aug 2018, 15:27:51, S.J.Thompson at bham.ac.uk wrote: From: S.J.Thompson at bham.ac.uk To: gpfsug-discuss at spectrumscale.org Cc: Date: 23. Aug 2018, 15:27:51 Subject: [gpfsug-discuss] Call home Hi, I?m just having a poke around with the callhome feature. If I use `mmcallhome group auto`, I can see that it creates a group. Now if I add a node to the cluster, how to I add that node to the same call home group that is already present? If I try for example: $ mmcallhome group add autoGroup_1 MYNEWSERVER --node all Failed to add this group: Group name "autoGroup_1" is already used ? Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Aug 23 15:25:00 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 23 Aug 2018 10:25:00 -0400 Subject: [gpfsug-discuss] Problem using group pool for migration In-Reply-To: References: Message-ID: Richard Powell, Good that you have it down to a smallish test case. Let's see it! Here's my test case. Notice I use -L 2 and -I test to see what's what: [root@/main/gpfs-git]$mmapplypolicy c41 -P /gh/c41gp.policy -L 2 -I test [I] GPFS Current Data Pool Utilization in KB and % Pool_Name KB_Occupied KB_Total Percent_Occupied cool 66048 9436160 0.699945741% system 1190656 8388608 14.193725586% xtra 66048 8388608 0.787353516% [I] 4045 of 65792 inodes used: 6.148164%. [I] Loaded policy rules from /gh/c41gp.policy. Evaluating policy rules with CURRENT_TIMESTAMP = 2018-08-23 at 14:11:26 UTC Parsed 2 policy rules. rule 'gp' group pool 'gp' is 'system' limit(3) then 'cool' limit(4) then 'xtra' rule 'mig' migrate from pool 'gp' to pool 'gp' weight(rand()) [I] 2018-08-23 at 14:11:26.367 Directory entries scanned: 8. [I] Directories scan: 7 files, 1 directories, 0 other objects, 0 'skipped' files and/or errors. [I] 2018-08-23 at 14:11:26.371 Sorting 8 file list records. [I] 2018-08-23 at 14:11:26.416 Policy evaluation. 8 files scanned. [I] 2018-08-23 at 14:11:26.421 Sorting 7 candidate file list records. WEIGHT(0.911647) MIGRATE /c41/100e TO POOL gp/cool SHOW() WEIGHT(0.840188) MIGRATE /c41/100a TO POOL gp/cool SHOW() WEIGHT(0.798440) MIGRATE /c41/100d TO POOL gp/cool SHOW() WEIGHT(0.783099) MIGRATE /c41/100c TO POOL gp/xtra SHOW() WEIGHT(0.394383) MIGRATE /c41/100b TO POOL gp/xtra SHOW() WEIGHT(0.335223) MIGRATE /c41/100g TO POOL gp/xtra SHOW() WEIGHT(0.197551) MIGRATE /c41/100f TO POOL gp/xtra SHOW() [I] 2018-08-23 at 14:11:26.430 Choosing candidate files. 7 records scanned. [I] Summary of Rule Applicability and File Choices: Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule 0 7 716800 7 716800 0 RULE 'mig' MIGRATE FROM POOL 'gp' WEIGHT(.) \ TO POOL 'gp' [I] Filesystem objects with no applicable rules: 1. [I] GPFS Policy Decisions and File Choice Totals: Chose to migrate 716800KB: 7 of 7 candidates; [I] File Migrations within Group Pools Group Pool Files_Out KB_Out Files_In KB_In gp system 7 716800 0 0 gp cool 0 0 3 307200 gp xtra 0 0 4 409600 Predicted Data Pool Utilization in KB and %: Pool_Name KB_Occupied KB_Total Percent_Occupied cool 373248 9436160 3.955507325% system 473856 8388608 5.648803711% xtra 475648 8388608 5.670166016% -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Aug 23 16:23:33 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 23 Aug 2018 11:23:33 -0400 Subject: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake In-Reply-To: References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: Millions of files per directory, may well be a mistake... BUT there are some very smart use cases that might take advantage of GPFS having good performance with large directories -- because GPFS uses extensible hashing -- it is better to store millions of files in a single GPFS directory than artificially scatter them among directories based on the mistaken notion that large directories are bad. (Yeah, they are in most implementations, but not in GPFS.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Thu Aug 23 16:32:24 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Thu, 23 Aug 2018 11:32:24 -0400 Subject: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake In-Reply-To: References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: <950BFBAE-145B-43DB-AE07-B3A17DC1795A@brown.edu> But heaven help you if you export the gpfs on nfs or cifs. -- ddj Dave Johnson > On Aug 23, 2018, at 11:23 AM, Marc A Kaplan wrote: > > Millions of files per directory, may well be a mistake... > > BUT there are some very smart use cases that might take advantage of GPFS having good performance with large directories -- > because GPFS uses extensible hashing -- it is better to store millions of files in a single GPFS directory than artificially scatter them among directories based on the mistaken notion that large directories are bad. (Yeah, they are in most implementations, but not in GPFS.) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Aug 23 18:01:27 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 23 Aug 2018 13:01:27 -0400 Subject: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake In-Reply-To: <950BFBAE-145B-43DB-AE07-B3A17DC1795A@brown.edu> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> <950BFBAE-145B-43DB-AE07-B3A17DC1795A@brown.edu> Message-ID: Even with nfs or samba export you're probably okay as long as the application does not attempt to list the directory. Just probe it with stat/open/create/unlink. From: david_johnson at brown.edu To: gpfsug main discussion list Date: 08/23/2018 11:34 AM Subject: Re: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake Sent by: gpfsug-discuss-bounces at spectrumscale.org But heaven help you if you export the gpfs on nfs or cifs. -- ddj Dave Johnson On Aug 23, 2018, at 11:23 AM, Marc A Kaplan wrote: Millions of files per directory, may well be a mistake... BUT there are some very smart use cases that might take advantage of GPFS having good performance with large directories -- because GPFS uses extensible hashing -- it is better to store millions of files in a single GPFS directory than artificially scatter them among directories based on the mistaken notion that large directories are bad. (Yeah, they are in most implementations, but not in GPFS.) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Aug 23 19:30:30 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Aug 2018 18:30:30 +0000 Subject: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake In-Reply-To: References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> <950BFBAE-145B-43DB-AE07-B3A17DC1795A@brown.edu> Message-ID: Thankfully all application developers completely understand why listing directories are a bad idea... ;o) Or at least they will learn the hard way otherwise, -B From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Marc A Kaplan Sent: Thursday, August 23, 2018 12:01 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake Note: External Email ________________________________ Even with nfs or samba export you're probably okay as long as the application does not attempt to list the directory. Just probe it with stat/open/create/unlink. From: david_johnson at brown.edu To: gpfsug main discussion list > Date: 08/23/2018 11:34 AM Subject: Re: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ But heaven help you if you export the gpfs on nfs or cifs. -- ddj Dave Johnson On Aug 23, 2018, at 11:23 AM, Marc A Kaplan > wrote: Millions of files per directory, may well be a mistake... BUT there are some very smart use cases that might take advantage of GPFS having good performance with large directories -- because GPFS uses extensible hashing -- it is better to store millions of files in a single GPFS directory than artificially scatter them among directories based on the mistaken notion that large directories are bad. (Yeah, they are in most implementations, but not in GPFS.) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Aug 23 19:37:21 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 23 Aug 2018 18:37:21 +0000 Subject: [gpfsug-discuss] Are you attending IBM TechU in Hollywood, FL in October? Message-ID: <754D53F3-70C8-4481-9219-1665214C9302@nuance.com> Hi, if you are attending the IBM TechU in October, and are interested in giving a sort client perspective on Spectrum Scale, I?d like to hear from you. On October 15th, there will be a small ?mini-UG? session at this TechU and we?d like to include a client presentation. The rough outline is below, and as you can see it?s ?short and sweet?. Please drop me a note if you?d like to present. 10 mins ? Welcome & Introductions 45 mins ? Spectrum Scale/ESS Latest Enhancements and IBM Coral Project 30 mins - Spectrum Scale Use Cases 20 mins ? Spectrum Scale Client presentation 20 mins ? Spectrum Scale Roadmap 15 mins ? Questions & Close Close ? Drinks & Networking Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Sat Aug 25 01:12:08 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 24 Aug 2018 17:12:08 -0700 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 In-Reply-To: <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov> References: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov> Message-ID: <1D31EBD3-CCC9-423B-83E9-3919C9A3DA1D@lbl.gov> You may consider this an official nag-o-gram that the registration deadline is approaching. September 1st?don?t forget! > On Aug 13, 2018, at 5:09 PM, Kristy Kallback-Rose wrote: > > All, don?t forget registration ends on the early side for this event due to background checks, etc. > > As noted below: > > IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. > > Hope you?ll be able to attend! > > Best, > Kristy > >> On Aug 3, 2018, at 12:37 PM, Kristy Kallback-Rose > wrote: >> >> All, >> >> Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: >> ? the draft agenda (bottom of page), >> ? a link to registration, register by September 1 due to ORNL site requirements (see next line) >> ? an important note about registration requirements for going to Oak Ridge National Lab >> ? a request for your site presentations >> ? information about HPCXXL and who to contact for information about joining, and >> ? other upcoming events. >> >> Hope you can attend and see Summit and Alpine first hand. >> >> Best, >> Kristy >> >> Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 >> >> IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. >> >> ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. >> >> About HPCXXL: >> HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. >> The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. >> To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. >> >> Other upcoming GPFS/SS events: >> Sep 19+20 HPCXXL, Oak Ridge >> Aug 10 Meetup along TechU, Sydney >> Oct 24 NYC User Meeting, New York >> Nov 11 SC, Dallas >> Dec 12 CIUK, Manchester >> >> >> Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ >> Duration Start End Title >> >> Wednesday 19th, 2018 >> >> Speaker >> >> TBD >> Chris Maestas (IBM) TBD (IBM) >> TBD (IBM) >> John Lewars (IBM) >> >> *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) >> John Lewars (IBM) >> >> Carl Zetie (IBM) TBD >> >> TBD (ORNL) >> TBD (IBM) >> William Godoy (ORNL) Ted Hoover (IBM) >> >> Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All >> >> 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 >> >> 13:15 Welcome >> 13:45 What is new in Spectrum Scale? >> 14:00 What is new in ESS? >> 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === >> 15:40 AWE >> 16:00 CSCS site report >> 16:20 Starfish (Sponsor talk) >> 16:50 Network Flow >> 17:20 RFEs >> 17:30 W rap-up >> >> Thursday 19th, 2018 >> >> 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 >> >> 08:50 Alpine ? the Summit file system >> 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library >> 10:00 AI Reference Architecture >> 10:30 === BREAK === >> 11:00 Encryption on the wire and on rest 11:30 Service Update >> 12:00 Open Forum >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ethan-Hereth at utc.edu Mon Aug 27 16:42:17 2018 From: Ethan-Hereth at utc.edu (Hereth, Ethan) Date: Mon, 27 Aug 2018 15:42:17 +0000 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 In-Reply-To: <1D31EBD3-CCC9-423B-83E9-3919C9A3DA1D@lbl.gov> References: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov>, <1D31EBD3-CCC9-423B-83E9-3919C9A3DA1D@lbl.gov> Message-ID: Good morning gpfsug!! TLDR: What day is included in the free GPFS/SS UGM? Can somebody please confirm for me the date(s) for the free GPFS/SS workshop/UGM? Firstly, it appears as if it's on both the 19th and 20th, secondly, the Eventbrite form says that I need to be very accurate so I want to be sure. I'm just 1.5 hours away, so I'm hoping to drive up for the UGM. Cheers! -- Ethan Alan Hereth, PhD High Performance Computing Specialist SimCenter: National Center for Computational Engineering 701 East M.L. King Boulevard Chattanooga, TN 37403 [work]:423.425.5431 [cell]:423.991.4971 ethan-hereth at utc.edu www.utc.edu/simcenter ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Kristy Kallback-Rose Sent: Friday, August 24, 2018 8:12:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 You may consider this an official nag-o-gram that the registration deadline is approaching. September 1st?don?t forget! On Aug 13, 2018, at 5:09 PM, Kristy Kallback-Rose > wrote: All, don?t forget registration ends on the early side for this event due to background checks, etc. As noted below: IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Hope you?ll be able to attend! Best, Kristy On Aug 3, 2018, at 12:37 PM, Kristy Kallback-Rose > wrote: All, Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: ? the draft agenda (bottom of page), ? a link to registration, register by September 1 due to ORNL site requirements (see next line) ? an important note about registration requirements for going to Oak Ridge National Lab ? a request for your site presentations ? information about HPCXXL and who to contact for information about joining, and ? other upcoming events. Hope you can attend and see Summit and Alpine first hand. Best, Kristy Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. About HPCXXL: HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. Other upcoming GPFS/SS events: Sep 19+20 HPCXXL, Oak Ridge Aug 10 Meetup along TechU, Sydney Oct 24 NYC User Meeting, New York Nov 11 SC, Dallas Dec 12 CIUK, Manchester Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ Duration Start End Title Wednesday 19th, 2018 Speaker TBD Chris Maestas (IBM) TBD (IBM) TBD (IBM) John Lewars (IBM) *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) John Lewars (IBM) Carl Zetie (IBM) TBD TBD (ORNL) TBD (IBM) William Godoy (ORNL) Ted Hoover (IBM) Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 13:15 Welcome 13:45 What is new in Spectrum Scale? 14:00 What is new in ESS? 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === 15:40 AWE 16:00 CSCS site report 16:20 Starfish (Sponsor talk) 16:50 Network Flow 17:20 RFEs 17:30 W rap-up Thursday 19th, 2018 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 08:50 Alpine ? the Summit file system 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library 10:00 AI Reference Architecture 10:30 === BREAK === 11:00 Encryption on the wire and on rest 11:30 Service Update 12:00 Open Forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Tue Aug 28 05:17:49 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 27 Aug 2018 21:17:49 -0700 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 In-Reply-To: References: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov> <1D31EBD3-CCC9-423B-83E9-3919C9A3DA1D@lbl.gov> Message-ID: <1802E998-2152-4FDE-9CE7-974203782317@lbl.gov> Two half-days are included. Wednesday 19th, 2018 starting 1p. Thursday 19th, 2018, starting 830 am. I believe there is a plan for a data center tour at the end of Thursday sessions "Summit Facility Tour? on the HPCXXL agenda. Let me know if there are other questions. -Kristy PS - Latest schedule is (PDF): > On Aug 27, 2018, at 8:42 AM, Hereth, Ethan wrote: > > Good morning gpfsug!! > > TLDR: What day is included in the free GPFS/SS UGM? > > Can somebody please confirm for me the date(s) for the free GPFS/SS workshop/UGM? Firstly, it appears as if it's on both the 19th and 20th, secondly, the Eventbrite form says that I need to be very accurate so I want to be sure. > > I'm just 1.5 hours away, so I'm hoping to drive up for the UGM. > > Cheers! > > -- > Ethan Alan Hereth, PhD > High Performance Computing Specialist > > SimCenter: National Center for Computational Engineering > 701 East M.L. King Boulevard > Chattanooga, TN 37403 > > [work]:423.425.5431 > [cell]:423.991.4971 > ethan-hereth at utc.edu > www.utc.edu/simcenter > From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Kristy Kallback-Rose > Sent: Friday, August 24, 2018 8:12:08 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 > > You may consider this an official nag-o-gram that the registration deadline is approaching. September 1st?don?t forget! > > >> On Aug 13, 2018, at 5:09 PM, Kristy Kallback-Rose > wrote: >> >> All, don?t forget registration ends on the early side for this event due to background checks, etc. >> >> As noted below: >> >> IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. >> >> Hope you?ll be able to attend! >> >> Best, >> Kristy >> >>> On Aug 3, 2018, at 12:37 PM, Kristy Kallback-Rose > wrote: >>> >>> All, >>> >>> Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: >>> ? the draft agenda (bottom of page), >>> ? a link to registration, register by September 1 due to ORNL site requirements (see next line) >>> ? an important note about registration requirements for going to Oak Ridge National Lab >>> ? a request for your site presentations >>> ? information about HPCXXL and who to contact for information about joining, and >>> ? other upcoming events. >>> >>> Hope you can attend and see Summit and Alpine first hand. >>> >>> Best, >>> Kristy >>> >>> Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 >>> >>> IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. >>> >>> ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. >>> >>> About HPCXXL: >>> HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. >>> The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. >>> To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. >>> >>> Other upcoming GPFS/SS events: >>> Sep 19+20 HPCXXL, Oak Ridge >>> Aug 10 Meetup along TechU, Sydney >>> Oct 24 NYC User Meeting, New York >>> Nov 11 SC, Dallas >>> Dec 12 CIUK, Manchester >>> >>> >>> Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ >>> Duration Start End Title >>> Wednesday 19th, 2018 >>> Speaker >>> TBD >>> Chris Maestas (IBM) TBD (IBM) >>> TBD (IBM) >>> John Lewars (IBM) >>> *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) >>> John Lewars (IBM) >>> Carl Zetie (IBM) TBD >>> TBD (ORNL) >>> TBD (IBM) >>> William Godoy (ORNL) Ted Hoover (IBM) >>> Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All >>> 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 >>> 13:15 Welcome >>> 13:45 What is new in Spectrum Scale? >>> 14:00 What is new in ESS? >>> 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === >>> 15:40 AWE >>> 16:00 CSCS site report >>> 16:20 Starfish (Sponsor talk) >>> 16:50 Network Flow >>> 17:20 RFEs >>> 17:30 W rap-up >>> Thursday 19th, 2018 >>> 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 >>> 08:50 Alpine ? the Summit file system >>> 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library >>> 10:00 AI Reference Architecture >>> 10:30 === BREAK === >>> 11:00 Encryption on the wire and on rest 11:30 Service Update >>> 12:00 Open Forum >>> >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SSUG18HPCXXL - Agenda - 2018-08-20.pdf Type: application/pdf Size: 109797 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Tue Aug 28 05:51:33 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 27 Aug 2018 21:51:33 -0700 Subject: [gpfsug-discuss] Hiring at NERSC Message-ID: <3721D290-56CB-4D82-9C70-1AF4E2D82CB9@lbl.gov> Hi storage folks, We?re hiring here at NERSC. There are two openings on the storage team at the National Energy Research Scientific Computing Center (NERSC, Berkeley, CA). One for a storage systems administrator and the other for a storage systems developer. If you have questions about the job or the area, let me know. Check the job posting out here: http://m.rfer.us/LBLlpzxG http://m.rfer.us/LBLmOKxH Cheers, Kristy From r.sobey at imperial.ac.uk Tue Aug 28 11:09:23 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 28 Aug 2018 10:09:23 +0000 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: I?m coming late to the party on this so forgive me, but I found that even using QoS I could not even snapshot my filesets in a timely fashion, so my rebalancing could only run at weekends with snapshotting disabled. Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of David Johnson Sent: 20 August 2018 17:55 To: gpfsug main discussion list Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P I have one storage pool that was recently doubled, and another pool migrated there using mmapplypolicy. The new half is only 50% full, and the old half is 94% full. Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) d05_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.93G ( 0%) d04_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.9G ( 0%) d03_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.12G ( 0%) d02_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.03G ( 0%) d01_george_23 50.49T 23 No Yes 25.9T ( 51%) 18.92G ( 0%) d00_george_23 50.49T 23 No Yes 25.91T ( 51%) 19.05G ( 0%) d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.35G ( 0%) d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.2G ( 0%) d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) 69.93G ( 0%) d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.11G ( 0%) d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.08G ( 0%) d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.3G ( 0%) d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.25G ( 0%) d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.28G ( 0%) ------------- -------------------- ------------------- (pool total) 706.9T 180.1T ( 25%) 675.5G ( 0%) Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks from the _cit_ NSDs to the _george_ NSDs, so that they end up all around 75% full? Thanks, ? ddj Dave Johnson Brown University CCV/CIS -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenneth.waegeman at ugent.be Tue Aug 28 13:22:46 2018 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Tue, 28 Aug 2018 14:22:46 +0200 Subject: [gpfsug-discuss] system.log pool on client nodes for HAWC Message-ID: Hi all, I was looking into HAWC , using the 'distributed fast storage in client nodes' method ( https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_hawc_using.htm ) This is achieved by putting? a local device on the clients in the system.log pool. Reading another article (https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_syslogpool.htm ) this would now be used for ALL File system recovery logs. Does this mean that if you have a (small) subset of clients with fast local devices added in the system.log pool, all other clients will use these too instead of the central system pool? Thank you! Kenneth From dod2014 at med.cornell.edu Wed Aug 29 03:51:08 2018 From: dod2014 at med.cornell.edu (Douglas Duckworth) Date: Tue, 28 Aug 2018 22:51:08 -0400 Subject: [gpfsug-discuss] More Drives For DDN 12KX Message-ID: Hi We have a 12KX which will be under support until 2020. Users are currently happy with throughput but we need greater capacity as approaching 80%. The enclosures are only half full. Does DDN require adding disks through them or can we get more 6TB SAS through someone else? We would want support contract for the new disks. If possible I think this would be a good stopgap solution until 2020 when we can buy a new faster cluster. Thank you for your feedback. -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Wed Aug 29 04:55:55 2018 From: skylar2 at uw.edu (Skylar Thompson) Date: Tue, 28 Aug 2018 20:55:55 -0700 Subject: [gpfsug-discuss] More Drives For DDN 12KX In-Reply-To: References: Message-ID: <20180829035555.GA32405@almaren> I would ask DDN this, but my guess is that even if the drives work, you would run into support headaches proving that whatever problem you're running into isn't the result of 3rd-party drives. Even with supported drives, we've run into drive firmware issues with almost all of our storage systems (not just DDN, but Isilon, Hitachi, EMC, etc.); for supported drives, it's a hassle to prove and then get updated, but it would be even worse without support on your side. On Tue, Aug 28, 2018 at 10:51:08PM -0400, Douglas Duckworth wrote: > Hi > > We have a 12KX which will be under support until 2020. Users are currently > happy with throughput but we need greater capacity as approaching 80%. > The enclosures are only half full. > > Does DDN require adding disks through them or can we get more 6TB SAS > through someone else? We would want support contract for the new disks. > If possible I think this would be a good stopgap solution until 2020 when > we can buy a new faster cluster. > > Thank you for your feedback. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From robert at strubi.ox.ac.uk Wed Aug 29 09:31:10 2018 From: robert at strubi.ox.ac.uk (Robert Esnouf) Date: Wed, 29 Aug 2018 09:31:10 +0100 Subject: [gpfsug-discuss] More Drives For DDN 12KX In-Reply-To: References: Message-ID: Realistically I can't see why you'd want to risk invalidating the support contracts that you have in place. You'll also take on worrying about firmware etc etc that is normally taken care of! You will need the caddies as well. We've just done this exercise SFA12KXE and 6TB SAS drives and as well as doubling space we got significantly more performance (after mmrestripe, unless your network is the bottleneck). We left 10 free slots for a potential SSD upgrade (in case of a large increase in inodes or small files). Regards, Robert -- Dr Robert Esnouf University Research Lecturer, Director of Research Computing BDI, Head of Research Computing Core WHG, NDM Research Computing Strategy Officer Main office: Room 10/028, Wellcome Centre for Human Genetics, Old Road Campus, Roosevelt Drive, Oxford OX3 7BN, UK Emails: robert at strubi.ox.ac.uk / robert at well.ox.ac.uk / robert.esnouf at bdi.ox.ac.uk Tel: (+44)-1865-287783 (WHG); (+44)-1865-743689 (BDI) ? -----Original Message----- From: "Douglas Duckworth" To: gpfsug-discuss at spectrumscale.org Date: 29/08/18 04:49 Subject: [gpfsug-discuss] More Drives For DDN 12KX Hi We have a 12KX which will be under support until 2020. Users are currently happy with throughput but we need greater capacity as approaching 80%. The enclosures are only half full. Does DDN require adding disks through them or can we get more 6TB SAS through someone else? We would want support contract for the new disks. If possible I think this would be a good stopgap solution until 2020 when we can buy a new faster cluster. Thank you for your feedback. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Aug 30 23:34:07 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 30 Aug 2018 23:34:07 +0100 Subject: [gpfsug-discuss] fast ACL alter solution In-Reply-To: <55CA6182.9010507@buzzard.me.uk> References: <201508111811.t7BIBYt0004336@d03av04.boulder.ibm.com> <55CA6182.9010507@buzzard.me.uk> Message-ID: On 11/08/15 21:56, Jonathan Buzzard wrote: [SNIP] > > As I said previously what is needed is an "mm" version of the FreeBSD > setfacl command > > http://www.freebsd.org/cgi/man.cgi?format=html&query=setfacl(1) > > That has the -R/--recursive option of the Linux setfacl command which > uses the fast inode scanning GPFS API. > > You want to be able to type something like > > ?mmsetfacl -mR g:www:rpaRc::allow foo > > What you don't want to be doing is calling the abomination of a command > that is mmputacl. Frankly whoever is responsible for that command needs > taking out the back and given a good kicking. A further three years down the line and setting NFSv4 ACL's on the Linux command line is still as painful as it was back in 2011. So I again have a requirement to set NFSv4 ACL's server side :-( Futher, unfortunately somewhere in the last six years I lost my C code to do this :-( In the process of redoing it I have been looking at the source code for the Linux NFSv4 ACL tools. I think that with minimal modification they can be ported to GPFS. So far I have hacked up nfs4_getfacl to work, and it should not be too much extra effort to hack up nfs_setfacl as well. However I have a some questions. Firstly what's the purpose of a special flag to indicate that it is smbd setting the ACL? Does this tie in with the undocumented "mmchfs -k samba" feature? Second there is a whole bunch of stuff about v4.1 ACL's. How does one trigger that. All I seem to be able to do is get POSIX and v4 ACL's. Do you get v4.1 ACL's if you set the file system to "Samba" ACL's? Note in the longer term it I think it would be better to modify FreeBSD's setfacl/getfacl (say renamed to mmsetfacl and mmgetfacl) to do the job, on the basis that they handle both POSIX and NFSv4 ACL's in a single command. Perhaps a RFE? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From vtarasov at us.ibm.com Fri Aug 31 18:49:01 2018 From: vtarasov at us.ibm.com (Vasily Tarasov) Date: Fri, 31 Aug 2018 17:49:01 +0000 Subject: [gpfsug-discuss] system.log pool on client nodes for HAWC In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Aug 31 19:25:34 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 31 Aug 2018 18:25:34 +0000 Subject: [gpfsug-discuss] system.log pool on client nodes for HAWC In-Reply-To: References: , Message-ID: I'm going to add a note of caution about HAWC as well... Firstly this was based on when it was first released,so things might have changed... HAWC replication uses the same failure group policy for placing replicas, therefore you need to use different failure groups for different client nodes. But do this carefully thinking about your failure domains. For example, we initially set each node in a cluster with its own failure group, might seem like a good idea until you shut the rack down (or even just a few select nodes might do it). You then lose your whole storage cluster by accident. (Or maybe you have hpc nodes and no UPS protection, if they have hawk and there is no protected replica, you lose the fs). Maybe this is obvious to everyone, but it bit us in various ways in our early testing. So if you plan to implement it, do test how your storage reacts when a client node fails. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of vtarasov at us.ibm.com [vtarasov at us.ibm.com] Sent: 31 August 2018 18:49 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] system.log pool on client nodes for HAWC That is correct. The blocks of each recovery log are striped across the devices in the system.log pool (if it is defined). As a result, even when all clients have a local device in the system.log pool, many writes to the recovery log will go to remote devices. For a client that lacks a local device in the system.log pool, log writes will always be remote. Notice, that typically in such a setup you would enable log replication for HA. Otherwise, if a single client fails (and its recover log is lost) the whole cluster fails as there is no log to recover FS to consistent state. Therefore, at least one remote write is essential. HTH, -- Vasily Tarasov, Research Staff Member, Storage Systems Research, IBM Research - Almaden ----- Original message ----- From: Kenneth Waegeman Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] system.log pool on client nodes for HAWC Date: Tue, Aug 28, 2018 5:31 AM Hi all, I was looking into HAWC , using the 'distributed fast storage in client nodes' method ( https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_hawc_using.htm ) This is achieved by putting a local device on the clients in the system.log pool. Reading another article (https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_syslogpool.htm ) this would now be used for ALL File system recovery logs. Does this mean that if you have a (small) subset of clients with fast local devices added in the system.log pool, all other clients will use these too instead of the central system pool? Thank you! Kenneth _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 17:55:04 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 16:55:04 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Message-ID: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Aug 1 18:21:01 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 1 Aug 2018 13:21:01 -0400 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+????? ??????????????????????????+ | Block size | Subblock size | +???????????????????????????????+????? ??????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+????? ??????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+????? ??????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+????? ??????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+????? ??????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Aug 1 19:21:28 2018 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 1 Aug 2018 14:21:28 -0400 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: Marc, Kevin, We'll be looking into this issue, since at least at a first glance, it does look odd. A 4MB block size should have resulted in an 8KB subblock size. I suspect that, somehow, the --metadata-block-size 1M may have resulted in 32768 Minimum fragment (subblock) size in bytes (other pools) but I do not yet understand how. The subblocks-per-full-block parameter is not supported with mmcrfs . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: "Marc A Kaplan" To: gpfsug main discussion list Date: 08/01/2018 01:21 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 19:08:08 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 18:08:08 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: <2E17AB2D-AC59-4A36-A6D8-235C2C2439C3@vanderbilt.edu> Hi Marc, Thanks for the response ? I understand what you?re saying, but since I?m asking for a 1 MB block size for metadata and a 4 MB block size for data and according to the chart in the mmcrfs man page both result in an 8 KB sub block size I?m still confused as to why I?ve got a 32 KB sub block size for my non-system (i.e. data) pools? Especially when you consider that 32 KB isn?t the default even if I had chosen an 8 or 16 MB block size! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 12:21 PM, Marc A Kaplan > wrote: I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd84fdde05c65406d4d9008d5f7d32f0f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687408760535040&sdata=hqVZVIQLbxakARTspzbSkMZBHi2b6%2BIcrPLU1atNbus%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Aug 1 19:41:05 2018 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 1 Aug 2018 11:41:05 -0700 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: the number of subblocks is derived by the smallest blocksize in any pool of a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in the metadata pool, but 4 x of that in the data pool if your data pool is 4M. sven On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop wrote: > Marc, Kevin, > > We'll be looking into this issue, since at least at a first glance, it > does look odd. A 4MB block size should have resulted in an 8KB subblock > size. I suspect that, somehow, the *--metadata-block-size** 1M* may have > resulted in > > > 32768 Minimum fragment (subblock) size in bytes (other pools) > > but I do not yet understand how. > > The *subblocks-per-full-block* parameter is not supported with *mmcrfs *. > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > [image: graycol.gif]"Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't > looked into all the details but here's a clue -- notice there is only one > "subblocks-per- > > From: "Marc A Kaplan" > > > To: gpfsug main discussion list > > Date: 08/01/2018 01:21 PM > Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? > > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > I haven't looked into all the details but here's a clue -- notice there is > only one "subblocks-per-full-block" parameter. > > And it is the same for both metadata blocks and datadata blocks. > > So maybe (MAYBE) that is a constraint somewhere... > > Certainly, in the currently supported code, that's what you get. > > > > > From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM > Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi All, > > Our production cluster is still on GPFS 4.2.3.x, but in preparation for > moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS > 5.0.1-1. I am setting up a new filesystem there using hardware that we > recently life-cycled out of our production environment. > > I ?successfully? created a filesystem but I believe the sub-block size is > wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs > man page the sub-block size should be 8K: > > Table 1. Block sizes and subblock sizes > > +???????????????????????????????+???????????????????????????????+ > | Block size | Subblock size | > +???????????????????????????????+???????????????????????????????+ > | 64 KiB | 2 KiB | > +???????????????????????????????+???????????????????????????????+ > | 128 KiB | 4 KiB | > +???????????????????????????????+???????????????????????????????+ > | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | > | MiB, 4 MiB | | > +???????????????????????????????+???????????????????????????????+ > | 8 MiB, 16 MiB | 16 KiB | > +???????????????????????????????+???????????????????????????????+ > > However, it appears that it?s 8K for the system pool but 32K for the other > pools: > > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 8192 Minimum fragment (subblock) size in bytes (system pool) > 32768 Minimum fragment (subblock) size in bytes (other pools) > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 2 Default number of metadata replicas > -M 3 Maximum number of metadata replicas > -r 1 Default number of data replicas > -R 3 Maximum number of data replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in effect > -k all ACL semantics in effect > -n 32 Estimated number of nodes that will mount file system > -B 1048576 Block size (system pool) > 4194304 Block size (other pools) > -Q user;group;fileset Quotas accounting enabled > user;group;fileset Quotas enforced > none Default quotas enabled > --perfileset-quota No Per-fileset quota enforcement > --filesetdf No Fileset df enabled? > -V 19.01 (5.0.1.0) File system version > --create-time Wed Aug 1 11:39:39 2018 File system creation time > -z No Is DMAPI enabled? > -L 33554432 Logfile size > -E Yes Exact mtime mount option > -S relatime Suppress atime mount option > -K whenpossible Strict replica allocation option > --fastea Yes Fast external attributes enabled? > --encryption No Encryption enabled? > --inode-limit 101095424 Maximum number of inodes > --log-replicas 0 Number of log replicas > --is4KAligned Yes is4KAligned? > --rapid-repair Yes rapidRepair enabled? > --write-cache-threshold 0 HAWC Threshold (max 65536) > --subblocks-per-full-block 128 Number of subblocks per full block > -P system;raid1;raid6 Disk storage pools in file system > --file-audit-log No File Audit Logging enabled? > --maintenance-mode No Maintenance Mode enabled? > -d > test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd > Disks in file system > -A yes Automatic mount option > -o none Additional mount options > -T /gpfs5 Default mount point > --mount-priority 0 Mount priority > > Output of mmcrfs: > > mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter > -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes > --nofilesetdf --metadata-block-size 1M > > The following disks of gpfs5 will be formatted on node testnsd3: > test21A3nsd: size 953609 MB > test21A4nsd: size 953609 MB > test21B3nsd: size 953609 MB > test21B4nsd: size 953609 MB > test23Ansd: size 15259744 MB > test23Bnsd: size 15259744 MB > test23Cnsd: size 1907468 MB > test24Ansd: size 15259744 MB > test24Bnsd: size 15259744 MB > test24Cnsd: size 1907468 MB > test25Ansd: size 15259744 MB > test25Bnsd: size 15259744 MB > test25Cnsd: size 1907468 MB > Formatting file system ... > Disks up to size 8.29 TB can be added to storage pool system. > Disks up to size 16.60 TB can be added to storage pool raid1. > Disks up to size 132.62 TB can be added to storage pool raid6. > Creating Inode File > 8 % complete on Wed Aug 1 11:39:19 2018 > 18 % complete on Wed Aug 1 11:39:24 2018 > 27 % complete on Wed Aug 1 11:39:29 2018 > 37 % complete on Wed Aug 1 11:39:34 2018 > 48 % complete on Wed Aug 1 11:39:39 2018 > 60 % complete on Wed Aug 1 11:39:44 2018 > 72 % complete on Wed Aug 1 11:39:49 2018 > 83 % complete on Wed Aug 1 11:39:54 2018 > 95 % complete on Wed Aug 1 11:39:59 2018 > 100 % complete on Wed Aug 1 11:40:01 2018 > Creating Allocation Maps > Creating Log Files > 3 % complete on Wed Aug 1 11:40:07 2018 > 28 % complete on Wed Aug 1 11:40:14 2018 > 53 % complete on Wed Aug 1 11:40:19 2018 > 78 % complete on Wed Aug 1 11:40:24 2018 > 100 % complete on Wed Aug 1 11:40:25 2018 > Clearing Inode Allocation Map > Clearing Block Allocation Map > Formatting Allocation Map for storage pool system > 85 % complete on Wed Aug 1 11:40:32 2018 > 100 % complete on Wed Aug 1 11:40:33 2018 > Formatting Allocation Map for storage pool raid1 > 53 % complete on Wed Aug 1 11:40:38 2018 > 100 % complete on Wed Aug 1 11:40:42 2018 > Formatting Allocation Map for storage pool raid6 > 20 % complete on Wed Aug 1 11:40:47 2018 > 39 % complete on Wed Aug 1 11:40:52 2018 > 60 % complete on Wed Aug 1 11:40:57 2018 > 79 % complete on Wed Aug 1 11:41:02 2018 > 100 % complete on Wed Aug 1 11:41:08 2018 > Completed creation of file system /dev/gpfs5. > mmcrfs: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > > And contents of stanza file: > > %nsd: > nsd=test21A3nsd > usage=metadataOnly > failureGroup=210 > pool=system > servers=testnsd3,testnsd1,testnsd2 > device=dm-15 > > %nsd: > nsd=test21A4nsd > usage=metadataOnly > failureGroup=210 > pool=system > servers=testnsd1,testnsd2,testnsd3 > device=dm-14 > > %nsd: > nsd=test21B3nsd > usage=metadataOnly > failureGroup=211 > pool=system > servers=testnsd1,testnsd2,testnsd3 > device=dm-17 > > %nsd: > nsd=test21B4nsd > usage=metadataOnly > failureGroup=211 > pool=system > servers=testnsd2,testnsd3,testnsd1 > device=dm-16 > > %nsd: > nsd=test23Ansd > usage=dataOnly > failureGroup=23 > pool=raid6 > servers=testnsd2,testnsd3,testnsd1 > device=dm-10 > > %nsd: > nsd=test23Bnsd > usage=dataOnly > failureGroup=23 > pool=raid6 > servers=testnsd3,testnsd1,testnsd2 > device=dm-9 > > %nsd: > nsd=test23Cnsd > usage=dataOnly > failureGroup=23 > pool=raid1 > servers=testnsd1,testnsd2,testnsd3 > device=dm-5 > > %nsd: > nsd=test24Ansd > usage=dataOnly > failureGroup=24 > pool=raid6 > servers=testnsd3,testnsd1,testnsd2 > device=dm-6 > > %nsd: > nsd=test24Bnsd > usage=dataOnly > failureGroup=24 > pool=raid6 > servers=testnsd1,testnsd2,testnsd3 > device=dm-0 > > %nsd: > nsd=test24Cnsd > usage=dataOnly > failureGroup=24 > pool=raid1 > servers=testnsd2,testnsd3,testnsd1 > device=dm-2 > > %nsd: > nsd=test25Ansd > usage=dataOnly > failureGroup=25 > pool=raid6 > servers=testnsd1,testnsd2,testnsd3 > device=dm-6 > > %nsd: > nsd=test25Bnsd > usage=dataOnly > failureGroup=25 > pool=raid6 > servers=testnsd2,testnsd3,testnsd1 > device=dm-6 > > %nsd: > nsd=test25Cnsd > usage=dataOnly > failureGroup=25 > pool=raid1 > servers=testnsd3,testnsd1,testnsd2 > device=dm-3 > > %pool: > pool=system > blockSize=1M > usage=metadataOnly > layoutMap=scatter > allowWriteAffinity=no > > %pool: > pool=raid6 > blockSize=4M > usage=dataOnly > layoutMap=scatter > allowWriteAffinity=no > > %pool: > pool=raid1 > blockSize=4M > usage=dataOnly > layoutMap=scatter > allowWriteAffinity=no > > What am I missing or what have I done wrong? Thanks? > > Kevin > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > *Kevin.Buterbaugh at vanderbilt.edu* - > (615)875-9633 <(615)%20875-9633> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Wed Aug 1 19:47:31 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 1 Aug 2018 14:47:31 -0400 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: <2E17AB2D-AC59-4A36-A6D8-235C2C2439C3@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <2E17AB2D-AC59-4A36-A6D8-235C2C2439C3@vanderbilt.edu> Message-ID: I guess that particular table is not the whole truth, nor a specification, nor a promise, but a simplified summary of what you get when there is just one block size that applies to both meta-data and data-data. You have discovered that it does not apply to systems where metadata has a different blocksize than data-data. My guesstimate (speculation!) is that the deployed code chooses one subblocks-per-full-block parameter and applies that to both. Which would explain the results we're seeing. Further is seems the the mmlsfs command assumes at least in some places that there is only one subblocks-per-block parameter... Looking deeper into code, is another story for another day -- but I'll say that there seems to be sufficient flexibility that if this were deemed a burning issue, there could be futher "enhancements..." ;-) From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/01/2018 02:24 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Marc, Thanks for the response ? I understand what you?re saying, but since I?m asking for a 1 MB block size for metadata and a 4 MB block size for data and according to the chart in the mmcrfs man page both result in an 8 KB sub block size I?m still confused as to why I?ve got a 32 KB sub block size for my non-system (i.e. data) pools? Especially when you consider that 32 KB isn?t the default even if I had chosen an 8 or 16 MB block size! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 12:21 PM, Marc A Kaplan wrote: I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+????? ??????????????????????????+ | Block size | Subblock size | +???????????????????????????????+????? ??????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+????? ??????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+????? ??????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+????? ??????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+????? ??????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd84fdde05c65406d4d9008d5f7d32f0f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687408760535040&sdata=hqVZVIQLbxakARTspzbSkMZBHi2b6%2BIcrPLU1atNbus%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 19:52:37 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 18:52:37 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> All, Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB ? so does this go back to what Marc is saying that there?s really only one sub blocks per block parameter? If so, is there any way to get what I want as described below? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L > wrote: Hi Sven, OK ? but why? I mean, that?s not what the man page says. Where does that ?4 x? come from? And, most importantly ? that?s not what I want. I want a smaller block size for the system pool since it?s metadata only and on RAID 1 mirrors (HD?s on the test cluster but SSD?s on the production cluster). So ? side question ? is 1 MB OK there? But I want a 4 MB block size for data with an 8 KB sub block ? I want good performance for the sane people using our cluster without unduly punishing the ? ahem ? fine folks whose apps want to create a bazillion tiny files! So how do I do that? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:41 PM, Sven Oehme > wrote: the number of subblocks is derived by the smallest blocksize in any pool of a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in the metadata pool, but 4 x of that in the data pool if your data pool is 4M. sven On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop > wrote: Marc, Kevin, We'll be looking into this issue, since at least at a first glance, it does look odd. A 4MB block size should have resulted in an 8KB subblock size. I suspect that, somehow, the --metadata-block-size 1M may have resulted in 32768 Minimum fragment (subblock) size in bytes (other pools) but I do not yet understand how. The subblocks-per-full-block parameter is not supported with mmcrfs . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 "Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per- From: "Marc A Kaplan" > To: gpfsug main discussion list > Date: 08/01/2018 01:21 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Wed Aug 1 20:10:50 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 1 Aug 2018 19:10:50 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Message-ID: Kevin asks: >>>> Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB ? so does this go back to what Marc is saying that there?s really only one sub blocks per block parameter? If so, is there any way to get what I want as described below? <<< Yep. Basically what's happening is: When you ask for a certain block size, Scale infers the subblock size as shown in the table. As Sven said, here you are asking for 1M blocks for metadata, so you get 8KiB subblocks. So far so good. These two numbers together determine the number of subblocks per block parameter, which as Marc said is shared across all the pools. So in order for your 4M data blocks to have the same number of subblocks per block as your 1M metadata blocks, the subblocks have to be 4 times as big. Something similar would happen with *any* choice of data block size above 1M, of course. The smallest size wins, and the 8KiB number is coming from the 1M, not the 4M. (Thanks, Sven). regards, Carl Zetie Offering Manager for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 19:47:47 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 18:47:47 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> Message-ID: Hi Sven, OK ? but why? I mean, that?s not what the man page says. Where does that ?4 x? come from? And, most importantly ? that?s not what I want. I want a smaller block size for the system pool since it?s metadata only and on RAID 1 mirrors (HD?s on the test cluster but SSD?s on the production cluster). So ? side question ? is 1 MB OK there? But I want a 4 MB block size for data with an 8 KB sub block ? I want good performance for the sane people using our cluster without unduly punishing the ? ahem ? fine folks whose apps want to create a bazillion tiny files! So how do I do that? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:41 PM, Sven Oehme > wrote: the number of subblocks is derived by the smallest blocksize in any pool of a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in the metadata pool, but 4 x of that in the data pool if your data pool is 4M. sven On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop > wrote: Marc, Kevin, We'll be looking into this issue, since at least at a first glance, it does look odd. A 4MB block size should have resulted in an 8KB subblock size. I suspect that, somehow, the --metadata-block-size 1M may have resulted in 32768 Minimum fragment (subblock) size in bytes (other pools) but I do not yet understand how. The subblocks-per-full-block parameter is not supported with mmcrfs . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 "Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per- From: "Marc A Kaplan" > To: gpfsug main discussion list > Date: 08/01/2018 01:21 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Aug 1 22:01:28 2018 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 1 Aug 2018 14:01:28 -0700 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> Message-ID: the only way to get max number of subblocks for a 5.0.x filesystem with the released code is to have metadata and data use the same blocksize. sven On Wed, Aug 1, 2018 at 11:52 AM Buterbaugh, Kevin L < Kevin.Buterbaugh at vanderbilt.edu> wrote: > All, > > Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB ? so does > this go back to what Marc is saying that there?s really only one sub blocks > per block parameter? If so, is there any way to get what I want as > described below? > > Thanks? > > Kevin > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> > > > On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L < > Kevin.Buterbaugh at Vanderbilt.Edu> wrote: > > Hi Sven, > > OK ? but why? I mean, that?s not what the man page says. Where does that > ?4 x? come from? > > And, most importantly ? that?s not what I want. I want a smaller block > size for the system pool since it?s metadata only and on RAID 1 mirrors > (HD?s on the test cluster but SSD?s on the production cluster). So ? side > question ? is 1 MB OK there? > > But I want a 4 MB block size for data with an 8 KB sub block ? I want good > performance for the sane people using our cluster without unduly punishing > the ? ahem ? fine folks whose apps want to create a bazillion tiny files! > > So how do I do that? > > Thanks! > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633> > > > On Aug 1, 2018, at 1:41 PM, Sven Oehme wrote: > > the number of subblocks is derived by the smallest blocksize in any pool > of a given filesystem. so if you pick a metadata blocksize of 1M it will be > 8k in the metadata pool, but 4 x of that in the data pool if your data pool > is 4M. > > sven > > On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop wrote: > > Marc, Kevin, >> >> We'll be looking into this issue, since at least at a first glance, it >> does look odd. A 4MB block size should have resulted in an 8KB subblock >> size. I suspect that, somehow, the *--metadata-block-size** 1M* may have >> resulted in >> >> >> 32768 Minimum fragment (subblock) size in bytes (other pools) >> >> but I do not yet understand how. >> >> The *subblocks-per-full-block* parameter is not supported with *mmcrfs *. >> >> Felipe >> >> ---- >> Felipe Knop knop at us.ibm.com >> GPFS Development and Security >> IBM Systems >> IBM Building 008 >> 2455 South Rd, Poughkeepsie, NY 12601 >> (845) 433-9314 T/L 293-9314 >> >> >> >> "Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't >> looked into all the details but here's a clue -- notice there is only one >> "subblocks-per- >> >> From: "Marc A Kaplan" >> >> >> To: gpfsug main discussion list >> >> Date: 08/01/2018 01:21 PM >> Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? >> >> >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> ------------------------------ >> >> >> >> I haven't looked into all the details but here's a clue -- notice there >> is only one "subblocks-per-full-block" parameter. >> >> And it is the same for both metadata blocks and datadata blocks. >> >> So maybe (MAYBE) that is a constraint somewhere... >> >> Certainly, in the currently supported code, that's what you get. >> >> >> >> >> From: "Buterbaugh, Kevin L" >> To: gpfsug main discussion list >> Date: 08/01/2018 12:55 PM >> Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> ------------------------------ >> >> >> >> Hi All, >> >> Our production cluster is still on GPFS 4.2.3.x, but in preparation for >> moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS >> 5.0.1-1. I am setting up a new filesystem there using hardware that we >> recently life-cycled out of our production environment. >> >> I ?successfully? created a filesystem but I believe the sub-block size is >> wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs >> man page the sub-block size should be 8K: >> >> Table 1. Block sizes and subblock sizes >> >> +???????????????????????????????+???????????????????????????????+ >> | Block size | Subblock size | >> +???????????????????????????????+???????????????????????????????+ >> | 64 KiB | 2 KiB | >> +???????????????????????????????+???????????????????????????????+ >> | 128 KiB | 4 KiB | >> +???????????????????????????????+???????????????????????????????+ >> | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | >> | MiB, 4 MiB | | >> +???????????????????????????????+???????????????????????????????+ >> | 8 MiB, 16 MiB | 16 KiB | >> +???????????????????????????????+???????????????????????????????+ >> >> However, it appears that it?s 8K for the system pool but 32K for the >> other pools: >> >> flag value description >> ------------------- ------------------------ >> ----------------------------------- >> -f 8192 Minimum fragment (subblock) size in bytes (system pool) >> 32768 Minimum fragment (subblock) size in bytes (other pools) >> -i 4096 Inode size in bytes >> -I 32768 Indirect block size in bytes >> -m 2 Default number of metadata replicas >> -M 3 Maximum number of metadata replicas >> -r 1 Default number of data replicas >> -R 3 Maximum number of data replicas >> -j scatter Block allocation type >> -D nfs4 File locking semantics in effect >> -k all ACL semantics in effect >> -n 32 Estimated number of nodes that will mount file system >> -B 1048576 Block size (system pool) >> 4194304 Block size (other pools) >> -Q user;group;fileset Quotas accounting enabled >> user;group;fileset Quotas enforced >> none Default quotas enabled >> --perfileset-quota No Per-fileset quota enforcement >> --filesetdf No Fileset df enabled? >> -V 19.01 (5.0.1.0) File system version >> --create-time Wed Aug 1 11:39:39 2018 File system creation time >> -z No Is DMAPI enabled? >> -L 33554432 Logfile size >> -E Yes Exact mtime mount option >> -S relatime Suppress atime mount option >> -K whenpossible Strict replica allocation option >> --fastea Yes Fast external attributes enabled? >> --encryption No Encryption enabled? >> --inode-limit 101095424 Maximum number of inodes >> --log-replicas 0 Number of log replicas >> --is4KAligned Yes is4KAligned? >> --rapid-repair Yes rapidRepair enabled? >> --write-cache-threshold 0 HAWC Threshold (max 65536) >> --subblocks-per-full-block 128 Number of subblocks per full block >> -P system;raid1;raid6 Disk storage pools in file system >> --file-audit-log No File Audit Logging enabled? >> --maintenance-mode No Maintenance Mode enabled? >> -d >> test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd >> Disks in file system >> -A yes Automatic mount option >> -o none Additional mount options >> -T /gpfs5 Default mount point >> --mount-priority 0 Mount priority >> >> Output of mmcrfs: >> >> mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j >> scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 >> -v yes --nofilesetdf --metadata-block-size 1M >> >> The following disks of gpfs5 will be formatted on node testnsd3: >> test21A3nsd: size 953609 MB >> test21A4nsd: size 953609 MB >> test21B3nsd: size 953609 MB >> test21B4nsd: size 953609 MB >> test23Ansd: size 15259744 MB >> test23Bnsd: size 15259744 MB >> test23Cnsd: size 1907468 MB >> test24Ansd: size 15259744 MB >> test24Bnsd: size 15259744 MB >> test24Cnsd: size 1907468 MB >> test25Ansd: size 15259744 MB >> test25Bnsd: size 15259744 MB >> test25Cnsd: size 1907468 MB >> Formatting file system ... >> Disks up to size 8.29 TB can be added to storage pool system. >> Disks up to size 16.60 TB can be added to storage pool raid1. >> Disks up to size 132.62 TB can be added to storage pool raid6. >> Creating Inode File >> 8 % complete on Wed Aug 1 11:39:19 2018 >> 18 % complete on Wed Aug 1 11:39:24 2018 >> 27 % complete on Wed Aug 1 11:39:29 2018 >> 37 % complete on Wed Aug 1 11:39:34 2018 >> 48 % complete on Wed Aug 1 11:39:39 2018 >> 60 % complete on Wed Aug 1 11:39:44 2018 >> 72 % complete on Wed Aug 1 11:39:49 2018 >> 83 % complete on Wed Aug 1 11:39:54 2018 >> 95 % complete on Wed Aug 1 11:39:59 2018 >> 100 % complete on Wed Aug 1 11:40:01 2018 >> Creating Allocation Maps >> Creating Log Files >> 3 % complete on Wed Aug 1 11:40:07 2018 >> 28 % complete on Wed Aug 1 11:40:14 2018 >> 53 % complete on Wed Aug 1 11:40:19 2018 >> 78 % complete on Wed Aug 1 11:40:24 2018 >> 100 % complete on Wed Aug 1 11:40:25 2018 >> Clearing Inode Allocation Map >> Clearing Block Allocation Map >> Formatting Allocation Map for storage pool system >> 85 % complete on Wed Aug 1 11:40:32 2018 >> 100 % complete on Wed Aug 1 11:40:33 2018 >> Formatting Allocation Map for storage pool raid1 >> 53 % complete on Wed Aug 1 11:40:38 2018 >> 100 % complete on Wed Aug 1 11:40:42 2018 >> Formatting Allocation Map for storage pool raid6 >> 20 % complete on Wed Aug 1 11:40:47 2018 >> 39 % complete on Wed Aug 1 11:40:52 2018 >> 60 % complete on Wed Aug 1 11:40:57 2018 >> 79 % complete on Wed Aug 1 11:41:02 2018 >> 100 % complete on Wed Aug 1 11:41:08 2018 >> Completed creation of file system /dev/gpfs5. >> mmcrfs: Propagating the cluster configuration data to all >> affected nodes. This is an asynchronous process. >> >> And contents of stanza file: >> >> %nsd: >> nsd=test21A3nsd >> usage=metadataOnly >> failureGroup=210 >> pool=system >> servers=testnsd3,testnsd1,testnsd2 >> device=dm-15 >> >> %nsd: >> nsd=test21A4nsd >> usage=metadataOnly >> failureGroup=210 >> pool=system >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-14 >> >> %nsd: >> nsd=test21B3nsd >> usage=metadataOnly >> failureGroup=211 >> pool=system >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-17 >> >> %nsd: >> nsd=test21B4nsd >> usage=metadataOnly >> failureGroup=211 >> pool=system >> servers=testnsd2,testnsd3,testnsd1 >> device=dm-16 >> >> %nsd: >> nsd=test23Ansd >> usage=dataOnly >> failureGroup=23 >> pool=raid6 >> servers=testnsd2,testnsd3,testnsd1 >> device=dm-10 >> >> %nsd: >> nsd=test23Bnsd >> usage=dataOnly >> failureGroup=23 >> pool=raid6 >> servers=testnsd3,testnsd1,testnsd2 >> device=dm-9 >> >> %nsd: >> nsd=test23Cnsd >> usage=dataOnly >> failureGroup=23 >> pool=raid1 >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-5 >> >> %nsd: >> nsd=test24Ansd >> usage=dataOnly >> failureGroup=24 >> pool=raid6 >> servers=testnsd3,testnsd1,testnsd2 >> device=dm-6 >> >> %nsd: >> nsd=test24Bnsd >> usage=dataOnly >> failureGroup=24 >> pool=raid6 >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-0 >> >> %nsd: >> nsd=test24Cnsd >> usage=dataOnly >> failureGroup=24 >> pool=raid1 >> servers=testnsd2,testnsd3,testnsd1 >> device=dm-2 >> >> %nsd: >> nsd=test25Ansd >> usage=dataOnly >> failureGroup=25 >> pool=raid6 >> servers=testnsd1,testnsd2,testnsd3 >> device=dm-6 >> >> %nsd: >> nsd=test25Bnsd >> usage=dataOnly >> failureGroup=25 >> pool=raid6 >> servers=testnsd2,testnsd3,testnsd1 >> device=dm-6 >> >> %nsd: >> nsd=test25Cnsd >> usage=dataOnly >> failureGroup=25 >> pool=raid1 >> servers=testnsd3,testnsd1,testnsd2 >> device=dm-3 >> >> %pool: >> pool=system >> blockSize=1M >> usage=metadataOnly >> layoutMap=scatter >> allowWriteAffinity=no >> >> %pool: >> pool=raid6 >> blockSize=4M >> usage=dataOnly >> layoutMap=scatter >> allowWriteAffinity=no >> >> %pool: >> pool=raid1 >> blockSize=4M >> usage=dataOnly >> layoutMap=scatter >> allowWriteAffinity=no >> >> What am I missing or what have I done wrong? Thanks? >> >> Kevin >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and >> Education >> *Kevin.Buterbaugh at vanderbilt.edu* - >> (615)875-9633 <(615)%20875-9633> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&reserved=0 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Aug 1 22:58:26 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 1 Aug 2018 21:58:26 +0000 Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> Message-ID: <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Hi Sven (and Stephen and everyone else), I know there are certainly things you know but can?t talk about, but I suspect that I am not the only one to wonder about the possible significance of ?with the released code? in your response below?!? I understand the technical point you?re making and maybe the solution for me is to just use a 4 MB block size for my metadata only system pool? As Stephen Ulmer said in his response ? ("Why the desire for a 1MB block size for metadata? It is RAID1 so no re-write penalty or need to hit a stripe size. Are you just trying to save the memory? If you had a 4MB block size, an 8KB sub-block size and things were 4K-aligned, you would always read 2 4K inodes,?) ? so if I?m using RAID 1 with 4K inodes then am I gaining anything by going with a smaller block size for metadata? So why was I choosing 1 MB in the first place? Well, I was planning on doing some experimenting with different block sizes for metadata to see if it made any difference. Historically, we had used a metadata block size of 64K to match the hardware ?stripe? size on the storage arrays (RAID 1 mirrors of hard drives back in the day). Now our metadata is on SSDs so with our latest filesystem we used 1 MB for both data and metadata because of the 1/32nd sub-block thing in GPFS 4.x. Since GPFS 5 removes that restriction, I was going to do some experimenting, but if the correct answer is just ?if 4 MB is what?s best for your data, then use it for metadata too? then I don?t mind saving some time?. ;-) Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 4:01 PM, Sven Oehme > wrote: the only way to get max number of subblocks for a 5.0.x filesystem with the released code is to have metadata and data use the same blocksize. sven On Wed, Aug 1, 2018 at 11:52 AM Buterbaugh, Kevin L > wrote: All, Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB ? so does this go back to what Marc is saying that there?s really only one sub blocks per block parameter? If so, is there any way to get what I want as described below? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L > wrote: Hi Sven, OK ? but why? I mean, that?s not what the man page says. Where does that ?4 x? come from? And, most importantly ? that?s not what I want. I want a smaller block size for the system pool since it?s metadata only and on RAID 1 mirrors (HD?s on the test cluster but SSD?s on the production cluster). So ? side question ? is 1 MB OK there? But I want a 4 MB block size for data with an 8 KB sub block ? I want good performance for the sane people using our cluster without unduly punishing the ? ahem ? fine folks whose apps want to create a bazillion tiny files! So how do I do that? Thanks! ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 1:41 PM, Sven Oehme > wrote: the number of subblocks is derived by the smallest blocksize in any pool of a given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in the metadata pool, but 4 x of that in the data pool if your data pool is 4M. sven On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop > wrote: Marc, Kevin, We'll be looking into this issue, since at least at a first glance, it does look odd. A 4MB block size should have resulted in an 8KB subblock size. I suspect that, somehow, the --metadata-block-size 1M may have resulted in 32768 Minimum fragment (subblock) size in bytes (other pools) but I do not yet understand how. The subblocks-per-full-block parameter is not supported with mmcrfs . Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 "Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per- From: "Marc A Kaplan" > To: gpfsug main discussion list > Date: 08/01/2018 01:21 PM Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I haven't looked into all the details but here's a clue -- notice there is only one "subblocks-per-full-block" parameter. And it is the same for both metadata blocks and datadata blocks. So maybe (MAYBE) that is a constraint somewhere... Certainly, in the currently supported code, that's what you get. From: "Buterbaugh, Kevin L" > To: gpfsug main discussion list > Date: 08/01/2018 12:55 PM Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am setting up a new filesystem there using hardware that we recently life-cycled out of our production environment. I ?successfully? created a filesystem but I believe the sub-block size is wrong. I?m using a 4 MB filesystem block size, so according to the mmcrfs man page the sub-block size should be 8K: Table 1. Block sizes and subblock sizes +???????????????????????????????+???????????????????????????????+ | Block size | Subblock size | +???????????????????????????????+???????????????????????????????+ | 64 KiB | 2 KiB | +???????????????????????????????+???????????????????????????????+ | 128 KiB | 4 KiB | +???????????????????????????????+???????????????????????????????+ | 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB | | MiB, 4 MiB | | +???????????????????????????????+???????????????????????????????+ | 8 MiB, 16 MiB | 16 KiB | +???????????????????????????????+???????????????????????????????+ However, it appears that it?s 8K for the system pool but 32K for the other pools: flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Wed Aug 1 11:39:39 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority Output of mmcrfs: mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 1M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 8 % complete on Wed Aug 1 11:39:19 2018 18 % complete on Wed Aug 1 11:39:24 2018 27 % complete on Wed Aug 1 11:39:29 2018 37 % complete on Wed Aug 1 11:39:34 2018 48 % complete on Wed Aug 1 11:39:39 2018 60 % complete on Wed Aug 1 11:39:44 2018 72 % complete on Wed Aug 1 11:39:49 2018 83 % complete on Wed Aug 1 11:39:54 2018 95 % complete on Wed Aug 1 11:39:59 2018 100 % complete on Wed Aug 1 11:40:01 2018 Creating Allocation Maps Creating Log Files 3 % complete on Wed Aug 1 11:40:07 2018 28 % complete on Wed Aug 1 11:40:14 2018 53 % complete on Wed Aug 1 11:40:19 2018 78 % complete on Wed Aug 1 11:40:24 2018 100 % complete on Wed Aug 1 11:40:25 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 85 % complete on Wed Aug 1 11:40:32 2018 100 % complete on Wed Aug 1 11:40:33 2018 Formatting Allocation Map for storage pool raid1 53 % complete on Wed Aug 1 11:40:38 2018 100 % complete on Wed Aug 1 11:40:42 2018 Formatting Allocation Map for storage pool raid6 20 % complete on Wed Aug 1 11:40:47 2018 39 % complete on Wed Aug 1 11:40:52 2018 60 % complete on Wed Aug 1 11:40:57 2018 79 % complete on Wed Aug 1 11:41:02 2018 100 % complete on Wed Aug 1 11:41:08 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. And contents of stanza file: %nsd: nsd=test21A3nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd3,testnsd1,testnsd2 device=dm-15 %nsd: nsd=test21A4nsd usage=metadataOnly failureGroup=210 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-14 %nsd: nsd=test21B3nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd1,testnsd2,testnsd3 device=dm-17 %nsd: nsd=test21B4nsd usage=metadataOnly failureGroup=211 pool=system servers=testnsd2,testnsd3,testnsd1 device=dm-16 %nsd: nsd=test23Ansd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-10 %nsd: nsd=test23Bnsd usage=dataOnly failureGroup=23 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-9 %nsd: nsd=test23Cnsd usage=dataOnly failureGroup=23 pool=raid1 servers=testnsd1,testnsd2,testnsd3 device=dm-5 %nsd: nsd=test24Ansd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd3,testnsd1,testnsd2 device=dm-6 %nsd: nsd=test24Bnsd usage=dataOnly failureGroup=24 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-0 %nsd: nsd=test24Cnsd usage=dataOnly failureGroup=24 pool=raid1 servers=testnsd2,testnsd3,testnsd1 device=dm-2 %nsd: nsd=test25Ansd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd1,testnsd2,testnsd3 device=dm-6 %nsd: nsd=test25Bnsd usage=dataOnly failureGroup=25 pool=raid6 servers=testnsd2,testnsd3,testnsd1 device=dm-6 %nsd: nsd=test25Cnsd usage=dataOnly failureGroup=25 pool=raid1 servers=testnsd3,testnsd1,testnsd2 device=dm-3 %pool: pool=system blockSize=1M usage=metadataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid6 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no %pool: pool=raid1 blockSize=4M usage=dataOnly layoutMap=scatter allowWriteAffinity=no What am I missing or what have I done wrong? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C23d636037b234fbbf9e908d5f7f1fcd1%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687541066564165&sdata=Z1tfD%2BMI1piJAtaBXQ2y9MEGNNLqCyKgHHws2wHmiTo%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Aug 2 01:00:47 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 1 Aug 2018 20:00:47 -0400 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Aug 2 01:11:51 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 2 Aug 2018 00:11:51 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: , <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Thu Aug 2 01:52:19 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 1 Aug 2018 20:52:19 -0400 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: <59D32F54-3A88-469D-9D44-CE12B675E95A@ulmer.org> > On Aug 1, 2018, at 8:11 PM, Andrew Beattie wrote: > [?] > > which is probably why 32k sub block was the default for so many years .... I may not be remembering correctly, but I thought the default block size was 256k, and the sub-block size was always fixed at 1/32nd of the block size ? which only yields 32k sub-blocks for a 1MB block size. I also think there used to be something special about a 16k block size? but I haven?t slept well in about a week, so I might just be losing it. -- Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Aug 2 02:10:10 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 2 Aug 2018 01:10:10 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: <59D32F54-3A88-469D-9D44-CE12B675E95A@ulmer.org> References: <59D32F54-3A88-469D-9D44-CE12B675E95A@ulmer.org>, <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Aug 2 09:44:02 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 2 Aug 2018 16:44:02 +0800 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: <59D32F54-3A88-469D-9D44-CE12B675E95A@ulmer.org><3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: In released GPFS, we only support one subblocks-per-fullblock in one file system, like Sven mentioned that the subblocks-per-fullblock is derived by the smallest block size of metadata and data pools, the smallest block size decides the subblocks-per-fullblock and subblock size of all pools. There's an enhancement plan to have pools with different block sizes and/or subblocks-per-fullblock. Thanks, Yuan, Zheng Cai From: "Andrew Beattie" To: gpfsug-discuss at spectrumscale.org Date: 2018/08/02 09:10 Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Sent by: gpfsug-discuss-bounces at spectrumscale.org Stephen, Sorry your right, I had to go back and look up what we were doing for metadata. but we ended up with 1MB block for metadata and 8MB for data and a 32k subblock based on the 1MB metadata block size, effectively a 256k subblock for the Data Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Stephen Ulmer Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Date: Thu, Aug 2, 2018 11:00 AM On Aug 1, 2018, at 8:11 PM, Andrew Beattie wrote: [?] which is probably why 32k sub block was the default for so many years .... I may not be remembering correctly, but I thought the default block size was 256k, and the sub-block size was always fixed at 1/32nd of the block size ? which only yields 32k sub-blocks for a 1MB block size. I also think there used to be something special about a 16k block size? but I haven?t slept well in about a week, so I might just be losing it. -- Stephen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Aug 2 16:56:20 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 2 Aug 2018 11:56:20 -0400 Subject: [gpfsug-discuss] Sven Oehme now at DDN In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: https://www.linkedin.com/in/oehmes/ Apparently, Sven is now "Chief Research Officer at DDN" -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Aug 2 17:01:58 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 2 Aug 2018 16:01:58 +0000 Subject: [gpfsug-discuss] Sven Oehme now at DDN Message-ID: <4D2B1925-2C14-47F8-A1A5-8E4EBA211462@nuance.com> Yes, I heard about this last week - Best of luck and congratulations Sven! I?m sure he?ll be around many of the GPFS events on the future. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Thursday, August 2, 2018 at 10:56 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] Sven Oehme now at DDN https://www.linkedin.com/in/oehmes/ Apparently, Sven is now "Chief Research Officer at DDN" -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Aug 2 21:31:39 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 2 Aug 2018 20:31:39 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: <1772373B-B371-46AF-A61F-1B310B6BC1A7@vanderbilt.edu> Hi All, Thanks for all the responses on this, although I have the sneaking suspicion that the most significant thing that is going to come out of this thread is the knowledge that Sven has left IBM for DDN. ;-) or :-( or :-O depending on your perspective. Anyway ? we have done some testing which has shown that a 4 MB block size is best for those workloads that use ?normal? sized files. However, we - like many similar institutions - support a mixed workload, so the 128K fragment size that comes with that is not optimal for the primarily biomedical type applications that literally create millions of very small files. That?s why we settled on 1 MB as a compromise. So we?re very eager to now test with GPFS 5, a 4 MB block size, and a 8K fragment size. I?m recreating my test cluster filesystem now with that config ? so 4 MB block size on the metadata only system pool, too. Thanks to all who took the time to respond to this thread. I hope it?s been beneficial to others as well? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 7:11 PM, Andrew Beattie > wrote: I too would second the comment about doing testing specific to your environment We recently deployed a number of ESS building blocks into a customer site that was specifically being used for a mixed HPC workload. We spent more than a week playing with different block sizes for both data and metadata trying to identify which variation would provide the best mix of both metadata performance and data performance. one thing we noticed very early on is that MDtest and IOR both respond very differently as you play with both block size and subblock size. What works for one use case may be a very poor option for another use case. Interestingly enough it turned out that the best overall option for our particular use case was an 8MB block size with 32k sub blocks -- as that gave us good Metadata performance and good sequential data performance which is probably why 32k sub block was the default for so many years .... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Marc A Kaplan" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Date: Thu, Aug 2, 2018 10:01 AM Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Aug 2 22:14:51 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 2 Aug 2018 21:14:51 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: <1772373B-B371-46AF-A61F-1B310B6BC1A7@vanderbilt.edu> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> <1772373B-B371-46AF-A61F-1B310B6BC1A7@vanderbilt.edu> Message-ID: OK, so hold on ? NOW what?s going on??? I deleted the filesystem ? went to lunch ? came back an hour later ? recreated the filesystem with a metadata block size of 4 MB ? and I STILL have a 1 MB block size in the system pool and the wrong fragment size in other pools? Kevin /root/gpfs root at testnsd1# mmdelfs gpfs5 All data on the following disks of gpfs5 will be destroyed: test21A3nsd test21A4nsd test21B3nsd test21B4nsd test23Ansd test23Bnsd test23Cnsd test24Ansd test24Bnsd test24Cnsd test25Ansd test25Bnsd test25Cnsd Completed deletion of file system /dev/gpfs5. mmdelfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root at testnsd1# mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 4M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 12 % complete on Thu Aug 2 13:16:26 2018 25 % complete on Thu Aug 2 13:16:31 2018 38 % complete on Thu Aug 2 13:16:36 2018 50 % complete on Thu Aug 2 13:16:41 2018 62 % complete on Thu Aug 2 13:16:46 2018 74 % complete on Thu Aug 2 13:16:52 2018 85 % complete on Thu Aug 2 13:16:57 2018 96 % complete on Thu Aug 2 13:17:02 2018 100 % complete on Thu Aug 2 13:17:03 2018 Creating Allocation Maps Creating Log Files 3 % complete on Thu Aug 2 13:17:09 2018 28 % complete on Thu Aug 2 13:17:15 2018 53 % complete on Thu Aug 2 13:17:20 2018 78 % complete on Thu Aug 2 13:17:26 2018 100 % complete on Thu Aug 2 13:17:27 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 98 % complete on Thu Aug 2 13:17:34 2018 100 % complete on Thu Aug 2 13:17:34 2018 Formatting Allocation Map for storage pool raid1 52 % complete on Thu Aug 2 13:17:39 2018 100 % complete on Thu Aug 2 13:17:43 2018 Formatting Allocation Map for storage pool raid6 24 % complete on Thu Aug 2 13:17:48 2018 50 % complete on Thu Aug 2 13:17:53 2018 74 % complete on Thu Aug 2 13:17:58 2018 99 % complete on Thu Aug 2 13:18:03 2018 100 % complete on Thu Aug 2 13:18:03 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root at testnsd1# mmlsfs gpfs5 flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Thu Aug 2 13:16:47 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority /root/gpfs root at testnsd1# ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 2, 2018, at 3:31 PM, Buterbaugh, Kevin L > wrote: Hi All, Thanks for all the responses on this, although I have the sneaking suspicion that the most significant thing that is going to come out of this thread is the knowledge that Sven has left IBM for DDN. ;-) or :-( or :-O depending on your perspective. Anyway ? we have done some testing which has shown that a 4 MB block size is best for those workloads that use ?normal? sized files. However, we - like many similar institutions - support a mixed workload, so the 128K fragment size that comes with that is not optimal for the primarily biomedical type applications that literally create millions of very small files. That?s why we settled on 1 MB as a compromise. So we?re very eager to now test with GPFS 5, a 4 MB block size, and a 8K fragment size. I?m recreating my test cluster filesystem now with that config ? so 4 MB block size on the metadata only system pool, too. Thanks to all who took the time to respond to this thread. I hope it?s been beneficial to others as well? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 7:11 PM, Andrew Beattie > wrote: I too would second the comment about doing testing specific to your environment We recently deployed a number of ESS building blocks into a customer site that was specifically being used for a mixed HPC workload. We spent more than a week playing with different block sizes for both data and metadata trying to identify which variation would provide the best mix of both metadata performance and data performance. one thing we noticed very early on is that MDtest and IOR both respond very differently as you play with both block size and subblock size. What works for one use case may be a very poor option for another use case. Interestingly enough it turned out that the best overall option for our particular use case was an 8MB block size with 32k sub blocks -- as that gave us good Metadata performance and good sequential data performance which is probably why 32k sub block was the default for so many years .... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Marc A Kaplan" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Date: Thu, Aug 2, 2018 10:01 AM Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=d1rBsXZEn1BlkmvHGKHvkk2%2FWmXAppqS5SbOQF0ZCrY%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Fri Aug 3 07:01:42 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 3 Aug 2018 06:01:42 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: Message-ID: Can u share your stanza file ? Von meinem iPhone gesendet > Am 02.08.2018 um 23:15 schrieb Buterbaugh, Kevin L : > > OK, so hold on ? NOW what?s going on??? I deleted the filesystem ? went to lunch ? came back an hour later ? recreated the filesystem with a metadata block size of 4 MB ? and I STILL have a 1 MB block size in the system pool and the wrong fragment size in other pools? > > Kevin > > /root/gpfs > root at testnsd1# mmdelfs gpfs5 > All data on the following disks of gpfs5 will be destroyed: > test21A3nsd > test21A4nsd > test21B3nsd > test21B4nsd > test23Ansd > test23Bnsd > test23Cnsd > test24Ansd > test24Bnsd > test24Cnsd > test25Ansd > test25Bnsd > test25Cnsd > Completed deletion of file system /dev/gpfs5. > mmdelfs: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > /root/gpfs > root at testnsd1# mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 4M > > The following disks of gpfs5 will be formatted on node testnsd3: > test21A3nsd: size 953609 MB > test21A4nsd: size 953609 MB > test21B3nsd: size 953609 MB > test21B4nsd: size 953609 MB > test23Ansd: size 15259744 MB > test23Bnsd: size 15259744 MB > test23Cnsd: size 1907468 MB > test24Ansd: size 15259744 MB > test24Bnsd: size 15259744 MB > test24Cnsd: size 1907468 MB > test25Ansd: size 15259744 MB > test25Bnsd: size 15259744 MB > test25Cnsd: size 1907468 MB > Formatting file system ... > Disks up to size 8.29 TB can be added to storage pool system. > Disks up to size 16.60 TB can be added to storage pool raid1. > Disks up to size 132.62 TB can be added to storage pool raid6. > Creating Inode File > 12 % complete on Thu Aug 2 13:16:26 2018 > 25 % complete on Thu Aug 2 13:16:31 2018 > 38 % complete on Thu Aug 2 13:16:36 2018 > 50 % complete on Thu Aug 2 13:16:41 2018 > 62 % complete on Thu Aug 2 13:16:46 2018 > 74 % complete on Thu Aug 2 13:16:52 2018 > 85 % complete on Thu Aug 2 13:16:57 2018 > 96 % complete on Thu Aug 2 13:17:02 2018 > 100 % complete on Thu Aug 2 13:17:03 2018 > Creating Allocation Maps > Creating Log Files > 3 % complete on Thu Aug 2 13:17:09 2018 > 28 % complete on Thu Aug 2 13:17:15 2018 > 53 % complete on Thu Aug 2 13:17:20 2018 > 78 % complete on Thu Aug 2 13:17:26 2018 > 100 % complete on Thu Aug 2 13:17:27 2018 > Clearing Inode Allocation Map > Clearing Block Allocation Map > Formatting Allocation Map for storage pool system > 98 % complete on Thu Aug 2 13:17:34 2018 > 100 % complete on Thu Aug 2 13:17:34 2018 > Formatting Allocation Map for storage pool raid1 > 52 % complete on Thu Aug 2 13:17:39 2018 > 100 % complete on Thu Aug 2 13:17:43 2018 > Formatting Allocation Map for storage pool raid6 > 24 % complete on Thu Aug 2 13:17:48 2018 > 50 % complete on Thu Aug 2 13:17:53 2018 > 74 % complete on Thu Aug 2 13:17:58 2018 > 99 % complete on Thu Aug 2 13:18:03 2018 > 100 % complete on Thu Aug 2 13:18:03 2018 > Completed creation of file system /dev/gpfs5. > mmcrfs: Propagating the cluster configuration data to all > affected nodes. This is an asynchronous process. > /root/gpfs > root at testnsd1# mmlsfs gpfs5 > flag value description > ------------------- ------------------------ ----------------------------------- > -f 8192 Minimum fragment (subblock) size in bytes (system pool) > 32768 Minimum fragment (subblock) size in bytes (other pools) > -i 4096 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 2 Default number of metadata replicas > -M 3 Maximum number of metadata replicas > -r 1 Default number of data replicas > -R 3 Maximum number of data replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in effect > -k all ACL semantics in effect > -n 32 Estimated number of nodes that will mount file system > -B 1048576 Block size (system pool) > 4194304 Block size (other pools) > -Q user;group;fileset Quotas accounting enabled > user;group;fileset Quotas enforced > none Default quotas enabled > --perfileset-quota No Per-fileset quota enforcement > --filesetdf No Fileset df enabled? > -V 19.01 (5.0.1.0) File system version > --create-time Thu Aug 2 13:16:47 2018 File system creation time > -z No Is DMAPI enabled? > -L 33554432 Logfile size > -E Yes Exact mtime mount option > -S relatime Suppress atime mount option > -K whenpossible Strict replica allocation option > --fastea Yes Fast external attributes enabled? > --encryption No Encryption enabled? > --inode-limit 101095424 Maximum number of inodes > --log-replicas 0 Number of log replicas > --is4KAligned Yes is4KAligned? > --rapid-repair Yes rapidRepair enabled? > --write-cache-threshold 0 HAWC Threshold (max 65536) > --subblocks-per-full-block 128 Number of subblocks per full block > -P system;raid1;raid6 Disk storage pools in file system > --file-audit-log No File Audit Logging enabled? > --maintenance-mode No Maintenance Mode enabled? > -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system > -A yes Automatic mount option > -o none Additional mount options > -T /gpfs5 Default mount point > --mount-priority 0 Mount priority > /root/gpfs > root at testnsd1# > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > >> On Aug 2, 2018, at 3:31 PM, Buterbaugh, Kevin L wrote: >> >> Hi All, >> >> Thanks for all the responses on this, although I have the sneaking suspicion that the most significant thing that is going to come out of this thread is the knowledge that Sven has left IBM for DDN. ;-) or :-( or :-O depending on your perspective. >> >> Anyway ? we have done some testing which has shown that a 4 MB block size is best for those workloads that use ?normal? sized files. However, we - like many similar institutions - support a mixed workload, so the 128K fragment size that comes with that is not optimal for the primarily biomedical type applications that literally create millions of very small files. That?s why we settled on 1 MB as a compromise. >> >> So we?re very eager to now test with GPFS 5, a 4 MB block size, and a 8K fragment size. I?m recreating my test cluster filesystem now with that config ? so 4 MB block size on the metadata only system pool, too. >> >> Thanks to all who took the time to respond to this thread. I hope it?s been beneficial to others as well? >> >> Kevin >> >> ? >> Kevin Buterbaugh - Senior System Administrator >> Vanderbilt University - Advanced Computing Center for Research and Education >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 >> >>> On Aug 1, 2018, at 7:11 PM, Andrew Beattie wrote: >>> >>> I too would second the comment about doing testing specific to your environment >>> >>> We recently deployed a number of ESS building blocks into a customer site that was specifically being used for a mixed HPC workload. >>> >>> We spent more than a week playing with different block sizes for both data and metadata trying to identify which variation would provide the best mix of both metadata performance and data performance. one thing we noticed very early on is that MDtest and IOR both respond very differently as you play with both block size and subblock size. What works for one use case may be a very poor option for another use case. >>> >>> Interestingly enough it turned out that the best overall option for our particular use case was an 8MB block size with 32k sub blocks -- as that gave us good Metadata performance and good sequential data performance >>> >>> which is probably why 32k sub block was the default for so many years .... >>> Andrew Beattie >>> Software Defined Storage - IT Specialist >>> Phone: 614-2133-7927 >>> E-mail: abeattie at au1.ibm.com >>> >>> >>> ----- Original message ----- >>> From: "Marc A Kaplan" >>> Sent by: gpfsug-discuss-bounces at spectrumscale.org >>> To: gpfsug main discussion list >>> Cc: >>> Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? >>> Date: Thu, Aug 2, 2018 10:01 AM >>> >>> Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. >>> >>> Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. >>> >>> Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". >>> >>> Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. >>> Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... >>> >>> OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0 >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=d1rBsXZEn1BlkmvHGKHvkk2%2FWmXAppqS5SbOQF0ZCrY%3D&reserved=0 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Fri Aug 3 07:53:31 2018 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Fri, 3 Aug 2018 08:53:31 +0200 Subject: [gpfsug-discuss] Sven, the man with the golden gun now at DDN Message-ID: FYI - Sven is on a TOP secret mission called "Skyfall"; with his spirit, super tech skills and know-how he will educate and convert all the poor Lustre souls which are fighting for the world leadership. The GPFS-Q-team in Poughkeepsie has prepared him a golden Walther PPK (9mm) with lot's of Scale v5. silver bullets. He was given a top secret make_all_kind_of_I/O faster debugger with auto tuning features. And off course he received a new car by Aston Martin with lot's of special features designed by POK. It has dual V20-cores, lots of RAM, a Mestor-transmission, twin-port RoCE turbochargers, AFM Rockets and LROC escape seats. Poughkeepsie is still in the process to hire a larger group of smart and good looking NMVeOF I/O girls; feel free to send your ideas and pictures. The list of selected "Sven Girls" with be published in a new section in the Scale FAQ. -frank- -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Fri Aug 3 13:49:48 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 3 Aug 2018 12:49:48 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: Message-ID: <11A27CF3-7484-45A8-ACFB-82B1F772A99B@vanderbilt.edu> Hi All, Aargh - now I really do feel like an idiot! I had set up the stanza file over a week ago ? then had to work on production issues ? and completely forgot about setting the block size in the pool stanzas there. But at least we all now know that stanza files override command line arguments to mmcrfs. My apologies? Kevin On Aug 3, 2018, at 1:01 AM, Olaf Weiser > wrote: Can u share your stanza file ? Von meinem iPhone gesendet Am 02.08.2018 um 23:15 schrieb Buterbaugh, Kevin L >: OK, so hold on ? NOW what?s going on??? I deleted the filesystem ? went to lunch ? came back an hour later ? recreated the filesystem with a metadata block size of 4 MB ? and I STILL have a 1 MB block size in the system pool and the wrong fragment size in other pools? Kevin /root/gpfs root at testnsd1# mmdelfs gpfs5 All data on the following disks of gpfs5 will be destroyed: test21A3nsd test21A4nsd test21B3nsd test21B4nsd test23Ansd test23Bnsd test23Cnsd test24Ansd test24Bnsd test24Cnsd test25Ansd test25Bnsd test25Cnsd Completed deletion of file system /dev/gpfs5. mmdelfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root at testnsd1# mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 4M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 12 % complete on Thu Aug 2 13:16:26 2018 25 % complete on Thu Aug 2 13:16:31 2018 38 % complete on Thu Aug 2 13:16:36 2018 50 % complete on Thu Aug 2 13:16:41 2018 62 % complete on Thu Aug 2 13:16:46 2018 74 % complete on Thu Aug 2 13:16:52 2018 85 % complete on Thu Aug 2 13:16:57 2018 96 % complete on Thu Aug 2 13:17:02 2018 100 % complete on Thu Aug 2 13:17:03 2018 Creating Allocation Maps Creating Log Files 3 % complete on Thu Aug 2 13:17:09 2018 28 % complete on Thu Aug 2 13:17:15 2018 53 % complete on Thu Aug 2 13:17:20 2018 78 % complete on Thu Aug 2 13:17:26 2018 100 % complete on Thu Aug 2 13:17:27 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 98 % complete on Thu Aug 2 13:17:34 2018 100 % complete on Thu Aug 2 13:17:34 2018 Formatting Allocation Map for storage pool raid1 52 % complete on Thu Aug 2 13:17:39 2018 100 % complete on Thu Aug 2 13:17:43 2018 Formatting Allocation Map for storage pool raid6 24 % complete on Thu Aug 2 13:17:48 2018 50 % complete on Thu Aug 2 13:17:53 2018 74 % complete on Thu Aug 2 13:17:58 2018 99 % complete on Thu Aug 2 13:18:03 2018 100 % complete on Thu Aug 2 13:18:03 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root at testnsd1# mmlsfs gpfs5 flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Thu Aug 2 13:16:47 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority /root/gpfs root at testnsd1# ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 2, 2018, at 3:31 PM, Buterbaugh, Kevin L > wrote: Hi All, Thanks for all the responses on this, although I have the sneaking suspicion that the most significant thing that is going to come out of this thread is the knowledge that Sven has left IBM for DDN. ;-) or :-( or :-O depending on your perspective. Anyway ? we have done some testing which has shown that a 4 MB block size is best for those workloads that use ?normal? sized files. However, we - like many similar institutions - support a mixed workload, so the 128K fragment size that comes with that is not optimal for the primarily biomedical type applications that literally create millions of very small files. That?s why we settled on 1 MB as a compromise. So we?re very eager to now test with GPFS 5, a 4 MB block size, and a 8K fragment size. I?m recreating my test cluster filesystem now with that config ? so 4 MB block size on the metadata only system pool, too. Thanks to all who took the time to respond to this thread. I hope it?s been beneficial to others as well? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 1, 2018, at 7:11 PM, Andrew Beattie > wrote: I too would second the comment about doing testing specific to your environment We recently deployed a number of ESS building blocks into a customer site that was specifically being used for a mixed HPC workload. We spent more than a week playing with different block sizes for both data and metadata trying to identify which variation would provide the best mix of both metadata performance and data performance. one thing we noticed very early on is that MDtest and IOR both respond very differently as you play with both block size and subblock size. What works for one use case may be a very poor option for another use case. Interestingly enough it turned out that the best overall option for our particular use case was an 8MB block size with 32k sub blocks -- as that gave us good Metadata performance and good sequential data performance which is probably why 32k sub block was the default for so many years .... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Marc A Kaplan" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Date: Thu, Aug 2, 2018 10:01 AM Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=d1rBsXZEn1BlkmvHGKHvkk2%2FWmXAppqS5SbOQF0ZCrY%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C89b5017f862b465a9ee908d5f9069a29%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688729119843837&sdata=0vjRu2TsZ5%2Bf84Sb7%2BTEdi8%2BmLGGpbqq%2FXNg2zfJRiw%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Aug 3 20:37:50 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 3 Aug 2018 12:37:50 -0700 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 Message-ID: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> All, Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: ? the draft agenda (bottom of page), ? a link to registration, register by September 1 due to ORNL site requirements (see next line) ? an important note about registration requirements for going to Oak Ridge National Lab ? a request for your site presentations ? information about HPCXXL and who to contact for information about joining, and ? other upcoming events. Hope you can attend and see Summit and Alpine first hand. Best, Kristy Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. About HPCXXL: HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. Other upcoming GPFS/SS events: Sep 19+20 HPCXXL, Oak Ridge Aug 10 Meetup along TechU, Sydney Oct 24 NYC User Meeting, New York Nov 11 SC, Dallas Dec 12 CIUK, Manchester Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ Duration Start End Title Wednesday 19th, 2018 Speaker TBD Chris Maestas (IBM) TBD (IBM) TBD (IBM) John Lewars (IBM) *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) John Lewars (IBM) Carl Zetie (IBM) TBD TBD (ORNL) TBD (IBM) William Godoy (ORNL) Ted Hoover (IBM) Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 13:15 Welcome 13:45 What is new in Spectrum Scale? 14:00 What is new in ESS? 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === 15:40 AWE 16:00 CSCS site report 16:20 Starfish (Sponsor talk) 16:50 Network Flow 17:20 RFEs 17:30 W rap-up Thursday 19th, 2018 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 08:50 Alpine ? the Summit file system 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library 10:00 AI Reference Architecture 10:30 === BREAK === 11:00 Encryption on the wire and on rest 11:30 Service Update 12:00 Open Forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Aug 6 19:34:34 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 6 Aug 2018 18:34:34 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: References: Message-ID: <60B6991C-8021-470E-BD71-B4885C726957@vanderbilt.edu> Hi All, So I was just reading the GPFS 5.0.0 Administration Guide (yes, I actually do look at the documentation even if it seems sometimes that I don?t!) for some other information and happened to come across this at the bottom of page 358: The --metadata-block-size flag on the mmcrfs command can be used to create a system pool with a different block size from the user pools. This can be especially beneficial if the default block size is larger than 1 MB. If data and metadata block sizes differ, the system pool must contain only metadataOnly disks. Given that one of the responses I received during this e-mail thread was from an IBM engineer basically pointing out that there is no benefit in setting the metadata-block-size to less than 4 MB if that?s what I want for the filesystem block size, this might be a candidate for a documentation update. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hnguyen at cray.com Mon Aug 6 20:52:28 2018 From: hnguyen at cray.com (Hoang Nguyen) Date: Mon, 6 Aug 2018 19:52:28 +0000 Subject: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? In-Reply-To: <60B6991C-8021-470E-BD71-B4885C726957@vanderbilt.edu> References: <60B6991C-8021-470E-BD71-B4885C726957@vanderbilt.edu> Message-ID: <7A96225E-B939-411F-B4C4-458DD4470B4D@cray.com> That comment in the Administration guide is a legacy comment when Metadata sub-block size was restricted to 1/32 of the Metadata block size. In the past, creating large Metadata block sizes also meant large sub-blocks and hence large directory blocks which wasted a lot of space. From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Monday, August 6, 2018 at 11:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Hi All, So I was just reading the GPFS 5.0.0 Administration Guide (yes, I actually do look at the documentation even if it seems sometimes that I don?t!) for some other information and happened to come across this at the bottom of page 358: The --metadata-block-size flag on the mmcrfs command can be used to create a system pool with a different block size from the user pools. This can be especially beneficial if the default block size is larger than 1 MB. If data and metadata block sizes differ, the system pool must contain only metadataOnly disks. Given that one of the responses I received during this e-mail thread was from an IBM engineer basically pointing out that there is no benefit in setting the metadata-block-size to less than 4 MB if that?s what I want for the filesystem block size, this might be a candidate for a documentation update. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Aug 6 22:42:54 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 6 Aug 2018 21:42:54 +0000 Subject: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: <735F4275-191A-4363-B98C-1EA289292037@vanderbilt.edu> Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From esperle at us.ibm.com Mon Aug 6 23:46:39 2018 From: esperle at us.ibm.com (Eric Sperley) Date: Mon, 6 Aug 2018 15:46:39 -0700 Subject: [gpfsug-discuss] mmaddcallback documentation issue In-Reply-To: <735F4275-191A-4363-B98C-1EA289292037@vanderbilt.edu> References: <735F4275-191A-4363-B98C-1EA289292037@vanderbilt.edu> Message-ID: See if this helps https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_mmaddcallback.htm Best Regards, Eric Eric Sperley, PhD SDI Architect To improve is to change; to be perfect is IBM Systems to change often - - Winston Churchill esperle at us.ibm.com +15033088721 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/06/2018 02:44 PM Subject: [gpfsug-discuss] mmaddcallback documentation issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A910265.gif Type: image/gif Size: 481 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A526482.gif Type: image/gif Size: 2322 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From peter.chase at metoffice.gov.uk Tue Aug 7 12:35:17 2018 From: peter.chase at metoffice.gov.uk (Chase, Peter) Date: Tue, 7 Aug 2018 11:35:17 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue Message-ID: Hi Kevin, I'm running policy migrations on Spectrum Scale 4.2.3, but I use mmapplypolicy to kick off the policy runs, not mmstartpolicy. Docs here (which I admit are not for your version of Spectrum Scale) state that mmstartpolicy is for internal GPFS use only: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Using+Policies So if the above link is correct, I'd recommend switching to using mmapplypolicy, which handily comes with a man page, whereas mmstartpolicy doesn't and might have you fumbling around in the dark. As for the issue you're experiencing with adding a callback, it looks like the mmaddcallback command is catching the --single-instance flag as an argument for it, not as a parameter for the mmstartpolicy command. After looking at the documentation you've referenced, I suspect that there's a typo/omission in the command and it should have a trailing double quote (") on the end of the parms argument list, i.e.: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance" I'm not sure how we go about asking IBM to correct their documentation, but expect someone in the user group will have some idea. Regards, Pete Chase peter.chase at metoffice.gov.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: 06 August 2018 23:47 To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 79, Issue 21 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. mmaddcallback documentation issue (Buterbaugh, Kevin L) 2. Re: mmaddcallback documentation issue (Eric Sperley) ---------------------------------------------------------------------- Message: 1 Date: Mon, 6 Aug 2018 21:42:54 +0000 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Subject: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: <735F4275-191A-4363-B98C-1EA289292037 at vanderbilt.edu> Content-Type: text/plain; charset="utf-8" Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Mon, 6 Aug 2018 15:46:39 -0700 From: "Eric Sperley" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: Content-Type: text/plain; charset="utf-8" See if this helps https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adm_mmaddcallback.htm Best Regards, Eric Eric Sperley, PhD SDI Architect To improve is to change; to be perfect is IBM Systems to change often - - Winston Churchill esperle at us.ibm.com +15033088721 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/06/2018 02:44 PM Subject: [gpfsug-discuss] mmaddcallback documentation issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A910265.gif Type: image/gif Size: 481 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A526482.gif Type: image/gif Size: 2322 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 79, Issue 21 ********************************************** From UWEFALKE at de.ibm.com Tue Aug 7 13:30:48 2018 From: UWEFALKE at de.ibm.com (Uwe Falke) Date: Tue, 7 Aug 2018 14:30:48 +0200 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue In-Reply-To: References: Message-ID: "I'm not sure how we go about asking IBM to correct their documentation,..." One way would be to open a PMR, er?, case. Mit freundlichen Gr??en / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefalke at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung: Thomas Wolter, Sven Schoo? Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From Kevin.Buterbaugh at Vanderbilt.Edu Tue Aug 7 17:14:27 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 7 Aug 2018 16:14:27 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue In-Reply-To: References: Message-ID: <3F1F205C-B3EB-44CF-BC47-84FDF335FBEF@vanderbilt.edu> Hi All, I was able to navigate down thru IBM?s website and find the GPFS 5.0.1 manuals but they contain the same typo, which Pete has correctly identified ? and I have confirmed that his solution works. Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Aug 7, 2018, at 6:35 AM, Chase, Peter > wrote: Hi Kevin, I'm running policy migrations on Spectrum Scale 4.2.3, but I use mmapplypolicy to kick off the policy runs, not mmstartpolicy. Docs here (which I admit are not for your version of Spectrum Scale) state that mmstartpolicy is for internal GPFS use only: https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fwikis%2Fhome%3Flang%3Den%23!%2Fwiki%2FGeneral%2BParallel%2BFile%2BSystem%2B(GPFS)%2Fpage%2FUsing%2BPolicies&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912985631&sdata=4PmYIvmKenhqtLRVhusaQpWHAjGcd6YFMkb5nMa%2Bwuw%3D&reserved=0 So if the above link is correct, I'd recommend switching to using mmapplypolicy, which handily comes with a man page, whereas mmstartpolicy doesn't and might have you fumbling around in the dark. As for the issue you're experiencing with adding a callback, it looks like the mmaddcallback command is catching the --single-instance flag as an argument for it, not as a parameter for the mmstartpolicy command. After looking at the documentation you've referenced, I suspect that there's a typo/omission in the command and it should have a trailing double quote (") on the end of the parms argument list, i.e.: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance" I'm not sure how we go about asking IBM to correct their documentation, but expect someone in the user group will have some idea. Regards, Pete Chase peter.chase at metoffice.gov.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: 06 August 2018 23:47 To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 79, Issue 21 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3D&reserved=0 or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. mmaddcallback documentation issue (Buterbaugh, Kevin L) 2. Re: mmaddcallback documentation issue (Eric Sperley) ---------------------------------------------------------------------- Message: 1 Date: Mon, 6 Aug 2018 21:42:54 +0000 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Subject: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: <735F4275-191A-4363-B98C-1EA289292037 at vanderbilt.edu> Content-Type: text/plain; charset="utf-8" Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Mon, 6 Aug 2018 15:46:39 -0700 From: "Eric Sperley" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] mmaddcallback documentation issue Message-ID: Content-Type: text/plain; charset="utf-8" See if this helps https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.1%2Fcom.ibm.spectrum.scale.v5r01.doc%2Fbl1adm_mmaddcallback.htm&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=WGASrQ8SqzMdkTkNRkeAEDoaACsnDZEAJF8G5GBIxsA%3D&reserved=0 Best Regards, Eric Eric Sperley, PhD SDI Architect To improve is to change; to be perfect is IBM Systems to change often - - Winston Churchill esperle at us.ibm.com +15033088721 From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 08/06/2018 02:44 PM Subject: [gpfsug-discuss] mmaddcallback documentation issue Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, So I?m _still_ reading about and testing various policies for file placement and migration on our test cluster (which is now running GPFS 5). On page 392 of the GPFS 5.0.0 Administration Guide it says: To add a callback, run this command. The following command is on one line: mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event lowDiskSpace --parms "%eventName %fsName --single-instance The --single-instance flag is required to avoid running multiple migrations on the file system at the same time. However, trying to issue that command gives: mmaddcallback: Incorrect option: --single-instance And the man page for mmaddcallback doesn?t mention it or anything similar to it. Now my test cluster is running GPFS 5.0.1.1, so is this something that was added in GPFS 5.0.0 and then subsequently removed? I can?t find the GPFS 5.0.1 Administration Guide with a Google search. Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A910265.gif Type: image/gif Size: 481 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1A526482.gif Type: image/gif Size: 2322 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3D&reserved=0 End of gpfsug-discuss Digest, Vol 79, Issue 21 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641&sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Tue Aug 7 17:58:45 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Tue, 7 Aug 2018 16:58:45 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue In-Reply-To: References: Message-ID: >I'm not sure how we go about asking IBM to correct their documentation, but expect someone in the user group will have some idea. File an RFE against Scale and I will route it to the right place. Carl Zetie Offering Manager for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From carlz at us.ibm.com Wed Aug 8 13:24:52 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 8 Aug 2018 12:24:52 +0000 Subject: [gpfsug-discuss] Easy way to submit Documentation corrections and enhancements Message-ID: It turns out that there is an easier, faster way to submit corrections and enhancements to the Scale documentation than sending me an RFE. At the bottom of each page in the Knowledge Center, there is a Comments section. You just need to be signed in under your IBM ID to add a comment. And all of the comments are read and processed by our information design team. regards, Carl Zetie Offering Manager for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From ulmer at ulmer.org Thu Aug 9 05:46:12 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Thu, 9 Aug 2018 00:46:12 -0400 Subject: [gpfsug-discuss] Sven Oehme now at DDN In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu> <76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu> <21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> Message-ID: <151B98C8-4CF7-42DF-A328-0DAABAE067D0@ulmer.org> But it still shows him employed at IBM through ?present?. So is he on-loan or is it ?permanent?? -- Stephen > On Aug 2, 2018, at 11:56 AM, Marc A Kaplan wrote: > > https://www.linkedin.com/in/oehmes/ > Apparently, Sven is now "Chief Research Officer at DDN" > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Aug 9 06:07:53 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 9 Aug 2018 07:07:53 +0200 Subject: [gpfsug-discuss] Sven Oehme now at DDN In-Reply-To: <151B98C8-4CF7-42DF-A328-0DAABAE067D0@ulmer.org> References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu> <151B98C8-4CF7-42DF-A328-0DAABAE067D0@ulmer.org> Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Aug 9 14:18:40 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 9 Aug 2018 09:18:40 -0400 Subject: [gpfsug-discuss] Sven Oehme now at DDN In-Reply-To: References: <3865728B-7185-4BE1-9BB0-8730A5CEE6A6@vanderbilt.edu><76EC376F-6040-425E-8F94-61AF3C46961D@vanderbilt.edu><21BF9577-03B9-4E45-BEB7-973FCB18FA7E@vanderbilt.edu><151B98C8-4CF7-42DF-A328-0DAABAE067D0@ulmer.org> Message-ID: https://en.wikipedia.org/wiki/Coopetition -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Thu Aug 9 20:11:27 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 9 Aug 2018 15:11:27 -0400 Subject: [gpfsug-discuss] logAssertFailed question Message-ID: <35653ad6-1184-d880-e7d6-0c55c87232f6@nasa.gov> Howdy All, We recently had a node running 4.2.3.6 (efix 9billion, sorry can't remember the exact efix) go wonky with a logAssertFailed error that looked similar to the description of this APAR fixed in 4.2.3.8: - Fix an assert in BufferDesc::flushBuffer Assert exp(!addrDirty || synchedStale || allDirty inode 554192 block 10 addrDirty 1 synchedStale 0 allDirty 0 that can happen during shutdown IJ04520 The odd thing is that APAR mentions the error can happen at shutdown and this node wasn't shutting down. In this APAR, can the error also occur when the node is not shutting down? Here's the head of the error we saw: Thu Aug 9 11:06:53.977 2018: [X] logAssertFailed: !addrDirty || synchedStale || allDirty Thu Aug 9 11:06:53.978 2018: [X] return code 0, reason code 0, log record tag 0 Thu Aug 9 11:06:57.557 2018: [X] *** Assert exp(!addrDirty || synchedStale || allDirty inode 96666844 snap 0 block 2034 bdP 0x1802F51DE40 addrDirty 1 synchedStale 0 allDirty 0 validBits 3x0-000000000003FFFF dirtyBits 3x0-000000000003FFFF ) in line 7316 of file /build/ode/ttn423ptf6/src/avs/fs/mmfs/ts/fs/bufdesc.C Thu Aug 9 11:06:57.558 2018: [E] *** Traceback: Thu Aug 9 11:06:57.559 2018: [E] 2:0x555555D6A016 logAssertFailed + 0x1B6 at ??:0 Thu Aug 9 11:06:57.560 2018: [E] 3:0x55555594B333 BufferDesc::flushBuffer(int, long long*) + 0x14A3 at ??:0 Thu Aug 9 11:06:57.561 2018: [E] 4:0x555555B483CE GlobalFS::LookForCleanToDo() + 0x2DE at ??:0 Thu Aug 9 11:06:57.562 2018: [E] 5:0x555555B48524 BufferCleanerBody(void*) + 0x74 at ??:0 Thu Aug 9 11:06:57.563 2018: [E] 6:0x555555868556 Thread::callBody(Thread*) + 0x46 at ??:0 Thu Aug 9 11:06:57.564 2018: [E] 7:0x555555855AF2 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 Thu Aug 9 11:06:57.565 2018: [E] 8:0x7FFFF79C5806 start_thread + 0xE6 at ??:0 Thu Aug 9 11:06:57.566 2018: [E] 9:0x7FFFF6B8567D clone + 0x6D at ??:0 mmfsd: /build/ode/ttn423ptf6/src/avs/fs/mmfs/ts/fs/bufdesc.C:7316: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion `!addrDirty || synchedStale || allDirty inode 96666844 snap 0 block 2034 bdP 0x1802F51DE40 addrDirty 1 synchedStale 0 allDirty 0 validBits 3x0-000000000003FFFF dirtyBits 3x0-000000000003FFFF ' failed. Thu Aug 9 11:06:57.586 2018: [E] Signal 6 at location 0x7FFFF6AD9875 in process 10775, link reg 0xFFFFFFFFFFFFFFFF. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From valdis.kletnieks at vt.edu Thu Aug 9 20:25:47 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 09 Aug 2018 15:25:47 -0400 Subject: [gpfsug-discuss] logAssertFailed question In-Reply-To: <35653ad6-1184-d880-e7d6-0c55c87232f6@nasa.gov> References: <35653ad6-1184-d880-e7d6-0c55c87232f6@nasa.gov> Message-ID: <29489.1533842747@turing-police.cc.vt.edu> On Thu, 09 Aug 2018 15:11:27 -0400, Aaron Knister said: > We recently had a node running 4.2.3.6 (efix 9billion, sorry can't > remember the exact efix) go wonky with a logAssertFailed error that > looked similar to the description of this APAR fixed in 4.2.3.8: > > - Fix an assert in BufferDesc::flushBuffer Assert exp(!addrDirty || > synchedStale || allDirty inode 554192 block 10 addrDirty 1 synchedStale > 0 allDirty 0 that can happen during shutdown IJ04520 Yep. *that* one. Saw it often enough to put a serious crimp in our style. 'logAssertFailed: ! addrDirty || synchedStale || allDirty' It's *totally* possible to hit it in the middle of a production workload. I don't think we ever saw it during shutdown. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Stephan.Peinkofer at lrz.de Fri Aug 10 12:29:18 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 10 Aug 2018 11:29:18 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Message-ID: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Fri Aug 10 13:51:56 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 10 Aug 2018 14:51:56 +0200 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: <298030c14ce94fae8f21aefe9d736b84@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Aug 10 14:02:33 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 10 Aug 2018 09:02:33 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Message-ID: Questions: How/why was the decision made to use a large number (~1000) of independent filesets ? What functions/features/commands are being used that work with independent filesets, that do not also work with "dependent" filesets? -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.lischewski at fz-juelich.de Fri Aug 10 15:25:17 2018 From: m.lischewski at fz-juelich.de (Martin Lischewski) Date: Fri, 10 Aug 2018 16:25:17 +0200 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Message-ID: Hello Olaf, hello Marc, we in J?lich are in the middle of migrating/copying all our old filesystems which were created with filesystem version: 13.23 (3.5.0.7) to new filesystems created with GPFS 5.0.1. We move to new filesystems mainly for two reasons: 1. We want to use the new increased number of subblocks. 2. We have to change our quota from normal "group-quota per filesystem" to "fileset-quota". The idea is to create a separate fileset for each group/project. For the users the quota-computation should be much more transparent. From now on all data which is stored inside of their directory (fileset) counts for their quota independent of the ownership. Right now we have round about 900 groups which means we will create round about 900 filesets per filesystem. In one filesystem we will have about 400million inodes (with rising tendency). This filesystem we will back up with "mmbackup" so we talked with Dominic Mueller-Wicke and he recommended us to use independent filesets. Because then the policy-runs can be parallelized and we can increase the backup performance. We belive that we require these parallelized policies run to meet our backup performance targets. But there are even more features we enable by using independet filesets. E.g. "Fileset level snapshots" and "user and group quotas inside of a fileset". I did not know about performance issues regarding independent filesets... Can you give us some more information about this? All in all we are strongly supporting the idea of increasing this limit. Do I understand correctly that by opening a PMR IBM allows to increase this limit on special sides? I would rather like to increase the limit and make it official public available and supported. Regards, Martin Am 10.08.2018 um 14:51 schrieb Olaf Weiser: > Hallo Stephan, > the limit is not a hard coded limit ?- technically spoken, you can > raise it easily. > But as always, it is a question of test 'n support .. > > I've seen customer cases, where the use of much smaller amount of > independent filesets generates a lot performance issues, hangs ... at > least noise and partial trouble .. > it might be not the case with your specific workload, because due to > the fact, that you 're running already ?close to 1000 ... > > I suspect , this number of 1000 file sets ?- at the time of > introducing it - was as also just that one had to pick a number... > > ... turns out.. that a general commitment to support > 1000 > ind.fileset is more or less hard.. because what uses cases should we > test / support > I think , there might be a good chance for you , that for your > specific workload, one would allow and support more than 1000 > > do you still have a PMR for your side for this ? ?- if not - I know .. > open PMRs is an additional ...but could you please .. > then we can decide .. if raising the limit is an option for you .. > > > > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 > E-Mail: olaf.weiser at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, > Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke , Uwe Tron > , Dorian Krause > Date: 08/10/2018 01:29 PM > Subject: [gpfsug-discuss] GPFS Independent Fileset Limit > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Dear IBM and GPFS List, > > we at the Leibniz Supercomputing Centre and our GCS Partners from the > J?lich Supercomputing Centre will soon be hitting the current > Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. > > There are also a number of RFEs from other users open, that target > this limitation: > _https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780_ > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534_ > __https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530_ > _https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282_ > > I know GPFS Development was very busy fulfilling the CORAL > requirements but maybe now there is again some time to improve > something else. > > If there are any other users on the list that are approaching the > current limitation in independent filesets, please take some time and > vote for the RFEs above. > > Many thanks in advance and have a nice weekend. > Best Regards, > Stephan Peinkofer > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5118 bytes Desc: S/MIME Cryptographic Signature URL: From Stephan.Peinkofer at lrz.de Fri Aug 10 16:14:46 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 10 Aug 2018 15:14:46 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> , Message-ID: Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): HPC WORK: Here every project has - for the lifetime of the project - a dedicated storage area that has some fileset quota attached to it, but no further per user or per group quotas are applied here. No backup is taken. Data Science Storage: This is for long term online and collaborative storage. Here projects can get so called "DSS Containers" to which they can give arbitrary users access to via a Self Service Interface (a little bit like Dropbox). Each of this DSS Containers is implemented via a independent fileset so that projects can also specify a per user quota for invited users, we can backup each container efficiently into a different TSM Node via mmbackup and we can run different actions using the mmapplypolicy to a DSS Container. Also we plan to offer our users to enable snapshots on their containers if they wish so. We currently deploy a 2PB file system for this and are in the process of bringing up two additional 10PB file systems for this but already have requests what it would mean if we have to scale this to 50PB. Data Science Archive (Planned): This is for long term archive storage. The usage model will be something similar to DSS but underlying, we plan to use TSM/HSM. Another point, but I don't remember it completely from the top of my head, where people might hit the limit is when they are using your OpenStack Manila integration. As It think your Manila driver creates an independent fileset for each network share in order to be able to provide the per share snapshot feature. So if someone is trying to use ISS in a bigger OS Cloud as Manila Storage the 1000er limit might hit them also. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Friday, August 10, 2018 3:02 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Questions: How/why was the decision made to use a large number (~1000) of independent filesets ? What functions/features/commands are being used that work with independent filesets, that do not also work with "dependent" filesets? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Peinkofer at lrz.de Fri Aug 10 16:39:50 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 10 Aug 2018 15:39:50 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, Message-ID: Dear Olaf, I know that this is "just" a "support" limit. However Sven some day on a UG meeting in Ehningen told me that there is more to this than just adjusting your QA qualification tests since the way it is implemented today does not really scale ;). That's probably the reason why you said you see sometimes problems when you are not even close to the limit. So if you look at the 250PB Alpine file system of Summit today, that is what's going to deployed at more than one site world wide in 2-4 years and imho independent filesets are a great way to make this large systems much more handy while still maintaining a unified namespace. So I really think it would be beneficial if the architectural limit that prevents scaling the number of independent filesets could be removed at all. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Olaf Weiser Sent: Friday, August 10, 2018 2:51 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Hallo Stephan, the limit is not a hard coded limit - technically spoken, you can raise it easily. But as always, it is a question of test 'n support .. I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. it might be not the case with your specific workload, because due to the fact, that you 're running already close to 1000 ... I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a number... ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what uses cases should we test / support I think , there might be a good chance for you , that for your specific workload, one would allow and support more than 1000 do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you please .. then we can decide .. if raising the limit is an option for you .. Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Peinkofer, Stephan" To: gpfsug main discussion list Cc: Doris Franke , Uwe Tron , Dorian Krause Date: 08/10/2018 01:29 PM Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 Sign up for an IBM account www.ibm.com IBM account registration https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Aug 10 16:51:28 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Aug 2018 15:51:28 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, Message-ID: <25008ae9da1649bb969592fdc0a5d6b5@jumptrading.com> This is definitely a great candidate for a RFE, if one does not already exist. Not to try and contradict by friend Olaf here, but I have been talking a lot with those internal to IBM, and the PMR process is for finding and correcting operational problems with the code level you are running, and closing out the PMR as quickly as possible. PMRs are not the vehicle for getting substantive changes and enhancements made to the product in general, which the RFE process is really the main way to do this. I just got off a call with Kristie and Carl about the RFE process and those on the list may know that we are working to improve this overall process. More will be sent out about this in the near future!! So I thought I would chime in on this discussion here to hopefully help us understand how important the RFE (admittedly currently got great) process really is and will be a great way to work together on these common goals and needs for the product we rely so heavily upon! Cheers!! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peinkofer, Stephan Sent: Friday, August 10, 2018 10:40 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ Dear Olaf, I know that this is "just" a "support" limit. However Sven some day on a UG meeting in Ehningen told me that there is more to this than just adjusting your QA qualification tests since the way it is implemented today does not really scale ;). That's probably the reason why you said you see sometimes problems when you are not even close to the limit. So if you look at the 250PB Alpine file system of Summit today, that is what's going to deployed at more than one site world wide in 2-4 years and imho independent filesets are a great way to make this large systems much more handy while still maintaining a unified namespace. So I really think it would be beneficial if the architectural limit that prevents scaling the number of independent filesets could be removed at all. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Olaf Weiser > Sent: Friday, August 10, 2018 2:51 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Hallo Stephan, the limit is not a hard coded limit - technically spoken, you can raise it easily. But as always, it is a question of test 'n support .. I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. it might be not the case with your specific workload, because due to the fact, that you 're running already close to 1000 ... I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a number... ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what uses cases should we test / support I think , there might be a good chance for you , that for your specific workload, one would allow and support more than 1000 do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you please .. then we can decide .. if raising the limit is an option for you .. Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke >, Uwe Tron >, Dorian Krause > Date: 08/10/2018 01:29 PM Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 Sign up for an IBM account www.ibm.com IBM account registration https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From djohnson at osc.edu Fri Aug 10 16:22:23 2018 From: djohnson at osc.edu (Doug Johnson) Date: Fri, 10 Aug 2018 11:22:23 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> Message-ID: Hi all, I want to chime in because this is precisely what we have done at OSC due to the same motivations Janell described. Our design was based in part on the guidelines in the "Petascale Data Protection" white paper from IBM. We only have ~200 filesets and 250M inodes today, but expect to grow. We are also very interested in details about performance issues and independent filesets. Can IBM elaborate? Best, Doug Martin Lischewski writes: > Hello Olaf, hello Marc, > > we in J?lich are in the middle of migrating/copying all our old filesystems which were created with filesystem > version: 13.23 (3.5.0.7) to new filesystems created with GPFS 5.0.1. > > We move to new filesystems mainly for two reasons: 1. We want to use the new increased number of subblocks. > 2. We have to change our quota from normal "group-quota per filesystem" to "fileset-quota". > > The idea is to create a separate fileset for each group/project. For the users the quota-computation should be > much more transparent. From now on all data which is stored inside of their directory (fileset) counts for their > quota independent of the ownership. > > Right now we have round about 900 groups which means we will create round about 900 filesets per filesystem. > In one filesystem we will have about 400million inodes (with rising tendency). > > This filesystem we will back up with "mmbackup" so we talked with Dominic Mueller-Wicke and he recommended > us to use independent filesets. Because then the policy-runs can be parallelized and we can increase the backup > performance. We belive that we require these parallelized policies run to meet our backup performance targets. > > But there are even more features we enable by using independet filesets. E.g. "Fileset level snapshots" and "user > and group quotas inside of a fileset". > > I did not know about performance issues regarding independent filesets... Can you give us some more > information about this? > > All in all we are strongly supporting the idea of increasing this limit. > > Do I understand correctly that by opening a PMR IBM allows to increase this limit on special sides? I would rather > like to increase the limit and make it official public available and supported. > > Regards, > > Martin > > Am 10.08.2018 um 14:51 schrieb Olaf Weiser: > > Hallo Stephan, > the limit is not a hard coded limit - technically spoken, you can raise it easily. > But as always, it is a question of test 'n support .. > > I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot > performance issues, hangs ... at least noise and partial trouble .. > it might be not the case with your specific workload, because due to the fact, that you 're running already > close to 1000 ... > > I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a > number... > > ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what > uses cases should we test / support > I think , there might be a good chance for you , that for your specific workload, one would allow and support > more than 1000 > > do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you > please .. > then we can decide .. if raising the limit is an option for you .. > > Mit freundlichen Gr??en / Kind regards > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 > E-Mail: olaf.weiser at de.ibm.com > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo > Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE > 99369940 > > From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke , Uwe Tron , Dorian Krause > > Date: 08/10/2018 01:29 PM > Subject: [gpfsug-discuss] GPFS Independent Fileset Limit > Sent by: gpfsug-discuss-bounces at spectrumscale.org > --------------------------------------------------------------------------------------------------- > > Dear IBM and GPFS List, > > we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will > soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. > > There are also a number of RFEs from other users open, that target this limitation: > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 > > I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again > some time to improve something else. > > If there are any other users on the list that are approaching the current limitation in independent filesets, > please take some time and vote for the RFEs above. > > Many thanks in advance and have a nice weekend. > Best Regards, > Stephan Peinkofer > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Fri Aug 10 17:01:17 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 10 Aug 2018 16:01:17 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: <25008ae9da1649bb969592fdc0a5d6b5@jumptrading.com> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, <25008ae9da1649bb969592fdc0a5d6b5@jumptrading.com> Message-ID: <01780289b9e14e599f848f78b33998d8@jumptrading.com> Just as a follow up to my own note, Stephan, already provided a list of existing RFEs from which to vote through the IBM RFE site, cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Bryan Banister Sent: Friday, August 10, 2018 10:51 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ This is definitely a great candidate for a RFE, if one does not already exist. Not to try and contradict by friend Olaf here, but I have been talking a lot with those internal to IBM, and the PMR process is for finding and correcting operational problems with the code level you are running, and closing out the PMR as quickly as possible. PMRs are not the vehicle for getting substantive changes and enhancements made to the product in general, which the RFE process is really the main way to do this. I just got off a call with Kristie and Carl about the RFE process and those on the list may know that we are working to improve this overall process. More will be sent out about this in the near future!! So I thought I would chime in on this discussion here to hopefully help us understand how important the RFE (admittedly currently got great) process really is and will be a great way to work together on these common goals and needs for the product we rely so heavily upon! Cheers!! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Peinkofer, Stephan Sent: Friday, August 10, 2018 10:40 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ Dear Olaf, I know that this is "just" a "support" limit. However Sven some day on a UG meeting in Ehningen told me that there is more to this than just adjusting your QA qualification tests since the way it is implemented today does not really scale ;). That's probably the reason why you said you see sometimes problems when you are not even close to the limit. So if you look at the 250PB Alpine file system of Summit today, that is what's going to deployed at more than one site world wide in 2-4 years and imho independent filesets are a great way to make this large systems much more handy while still maintaining a unified namespace. So I really think it would be beneficial if the architectural limit that prevents scaling the number of independent filesets could be removed at all. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Olaf Weiser > Sent: Friday, August 10, 2018 2:51 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Hallo Stephan, the limit is not a hard coded limit - technically spoken, you can raise it easily. But as always, it is a question of test 'n support .. I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. it might be not the case with your specific workload, because due to the fact, that you 're running already close to 1000 ... I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a number... ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what uses cases should we test / support I think , there might be a good chance for you , that for your specific workload, one would allow and support more than 1000 do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you please .. then we can decide .. if raising the limit is an option for you .. Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke >, Uwe Tron >, Dorian Krause > Date: 08/10/2018 01:29 PM Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 Sign up for an IBM account www.ibm.com IBM account registration https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Aug 10 18:15:34 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 10 Aug 2018 13:15:34 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, Message-ID: I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): -------------- next part -------------- An HTML attachment was scrubbed... URL: From anobre at br.ibm.com Fri Aug 10 19:10:35 2018 From: anobre at br.ibm.com (Anderson Ferreira Nobre) Date: Fri, 10 Aug 2018 18:10:35 +0000 Subject: [gpfsug-discuss] Top files on GPFS filesystem Message-ID: An HTML attachment was scrubbed... URL: From jake.carroll at uq.edu.au Sat Aug 11 03:18:28 2018 From: jake.carroll at uq.edu.au (Jake Carroll) Date: Sat, 11 Aug 2018 02:18:28 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Message-ID: Just to chime in on this... We have experienced a lot of problems as a result of the independent fileset limitation @ 1000. We have a very large campus wide deployment that relies upon filesets for collection management of large (and small) scientific data outputs. Every human who uses our GPFS AFM fabric gets a "collection", which is an independent fileset. Some may say this was an unwise design choice - but it was deliberate and related to security, namespace and inode isolation. It is a considered decision. Just not considered _enough_ given the 1000 fileset limit ;). We've even had to go as far as re-organising entire filesystems (splitting things apart) to sacrifice performance (less spindles for the filesets on top of a filesystem) to work around it - and sometimes spill into entirely new arrays. I've had it explained to me by internal IBM staff *why* it is hard to fix the fileset limits - and it isn't as straightforward as people think - especially in our case where each fileset is an AFM cache/home relationship - but we desperately need a solution. We logged an RFE. Hopefully others do, also. The complexity has been explained to me by a very good colleague who has helped us a great deal inside IBM (name withheld to protect the innocent) as a knock on effect of the computational overhead and expense of things _associated_ with independent filesets, like recursing a snapshot tree. So - it really isn't as simple as things appear on the surface - but it doesn't mean we shouldn't try to fix it, I suppose! We'd love to see this improved, too - as it's currently making things difficult. Happy to collaborate and work together on this, as always. -jc ---------------------------------------------------------------------- Message: 1 Date: Fri, 10 Aug 2018 11:22:23 -0400 From: Doug Johnson Hi all, I want to chime in because this is precisely what we have done at OSC due to the same motivations Janell described. Our design was based in part on the guidelines in the "Petascale Data Protection" white paper from IBM. We only have ~200 filesets and 250M inodes today, but expect to grow. We are also very interested in details about performance issues and independent filesets. Can IBM elaborate? Best, Doug Martin Lischewski writes: > Hello Olaf, hello Marc, > > we in J?lich are in the middle of migrating/copying all our old > filesystems which were created with filesystem > version: 13.23 (3.5.0.7) to new filesystems created with GPFS 5.0.1. > > We move to new filesystems mainly for two reasons: 1. We want to use the new increased number of subblocks. > 2. We have to change our quota from normal "group-quota per filesystem" to "fileset-quota". > > The idea is to create a separate fileset for each group/project. For > the users the quota-computation should be much more transparent. From > now on all data which is stored inside of their directory (fileset) counts for their quota independent of the ownership. > > Right now we have round about 900 groups which means we will create round about 900 filesets per filesystem. > In one filesystem we will have about 400million inodes (with rising tendency). > > This filesystem we will back up with "mmbackup" so we talked with > Dominic Mueller-Wicke and he recommended us to use independent > filesets. Because then the policy-runs can be parallelized and we can increase the backup performance. We belive that we require these parallelized policies run to meet our backup performance targets. > > But there are even more features we enable by using independet > filesets. E.g. "Fileset level snapshots" and "user and group quotas inside of a fileset". > > I did not know about performance issues regarding independent > filesets... Can you give us some more information about this? > > All in all we are strongly supporting the idea of increasing this limit. > > Do I understand correctly that by opening a PMR IBM allows to increase > this limit on special sides? I would rather like to increase the limit and make it official public available and supported. > > Regards, > > Martin > > Am 10.08.2018 um 14:51 schrieb Olaf Weiser: > > Hallo Stephan, > the limit is not a hard coded limit - technically spoken, you can raise it easily. > But as always, it is a question of test 'n support .. > > I've seen customer cases, where the use of much smaller amount of > independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. > it might be not the case with your specific workload, because due to > the fact, that you 're running already close to 1000 ... > > I suspect , this number of 1000 file sets - at the time of > introducing it - was as also just that one had to pick a number... > > ... turns out.. that a general commitment to support > 1000 > ind.fileset is more or less hard.. because what uses cases should we > test / support I think , there might be a good chance for you , that > for your specific workload, one would allow and support more than > 1000 > > do you still have a PMR for your side for this ? - if not - I know .. > open PMRs is an additional ...but could you please .. > then we can decide .. if raising the limit is an option for you .. > > Mit freundlichen Gr??en / Kind regards > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ---------------------------------------------------------------------- > --------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 > E-Mail: olaf.weiser at de.ibm.com > > ---------------------------------------------------------------------- > --------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, > Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht > Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE > 99369940 > > From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke , Uwe Tron > , Dorian Krause > Date: 08/10/2018 01:29 PM > Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: > gpfsug-discuss-bounces at spectrumscale.org > ---------------------------------------------------------------------- > ----------------------------- > > Dear IBM and GPFS List, > > we at the Leibniz Supercomputing Centre and our GCS Partners from the > J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. > > There are also a number of RFEs from other users open, that target this limitation: > > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID= > 56780 > > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID= > 120534 > > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID= > 106530 > > https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID= > 85282 > > I know GPFS Development was very busy fulfilling the CORAL > requirements but maybe now there is again some time to improve something else. > > If there are any other users on the list that are approaching the > current limitation in independent filesets, please take some time and vote for the RFEs above. > > Many thanks in advance and have a nice weekend. > Best Regards, > Stephan Peinkofer > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ Message: 2 Date: Fri, 10 Aug 2018 16:01:17 +0000 From: Bryan Banister To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Message-ID: <01780289b9e14e599f848f78b33998d8 at jumptrading.com> Content-Type: text/plain; charset="iso-8859-1" Just as a follow up to my own note, Stephan, already provided a list of existing RFEs from which to vote through the IBM RFE site, cheers, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Bryan Banister Sent: Friday, August 10, 2018 10:51 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ This is definitely a great candidate for a RFE, if one does not already exist. Not to try and contradict by friend Olaf here, but I have been talking a lot with those internal to IBM, and the PMR process is for finding and correcting operational problems with the code level you are running, and closing out the PMR as quickly as possible. PMRs are not the vehicle for getting substantive changes and enhancements made to the product in general, which the RFE process is really the main way to do this. I just got off a call with Kristie and Carl about the RFE process and those on the list may know that we are working to improve this overall process. More will be sent out about this in the near future!! So I thought I would chime in on this discussion here to hopefully help us understand how important the RFE (admittedly currently got great) process really is and will be a great way to work together on these common goals and needs for the product we rely so heavily upon! Cheers!! -Bryan From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Peinkofer, Stephan Sent: Friday, August 10, 2018 10:40 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Note: External Email ________________________________ Dear Olaf, I know that this is "just" a "support" limit. However Sven some day on a UG meeting in Ehningen told me that there is more to this than just adjusting your QA qualification tests since the way it is implemented today does not really scale ;). That's probably the reason why you said you see sometimes problems when you are not even close to the limit. So if you look at the 250PB Alpine file system of Summit today, that is what's going to deployed at more than one site world wide in 2-4 years and imho independent filesets are a great way to make this large systems much more handy while still maintaining a unified namespace. So I really think it would be beneficial if the architectural limit that prevents scaling the number of independent filesets could be removed at all. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Olaf Weiser > Sent: Friday, August 10, 2018 2:51 PM To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org; Doris Franke; Uwe Tron; Dorian Krause Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Hallo Stephan, the limit is not a hard coded limit - technically spoken, you can raise it easily. But as always, it is a question of test 'n support .. I've seen customer cases, where the use of much smaller amount of independent filesets generates a lot performance issues, hangs ... at least noise and partial trouble .. it might be not the case with your specific workload, because due to the fact, that you 're running already close to 1000 ... I suspect , this number of 1000 file sets - at the time of introducing it - was as also just that one had to pick a number... ... turns out.. that a general commitment to support > 1000 ind.fileset is more or less hard.. because what uses cases should we test / support I think , there might be a good chance for you , that for your specific workload, one would allow and support more than 1000 do you still have a PMR for your side for this ? - if not - I know .. open PMRs is an additional ...but could you please .. then we can decide .. if raising the limit is an option for you .. Mit freundlichen Gr??en / Kind regards Olaf Weiser EMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform, ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland IBM Allee 1 71139 Ehningen Phone: +49-170-579-44-66 E-Mail: olaf.weiser at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Cc: Doris Franke >, Uwe Tron >, Dorian Krause > Date: 08/10/2018 01:29 PM Subject: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear IBM and GPFS List, we at the Leibniz Supercomputing Centre and our GCS Partners from the J?lich Supercomputing Centre will soon be hitting the current Independent Fileset Limit of 1000 on a number of our GPFS Filesystems. There are also a number of RFEs from other users open, that target this limitation: https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=56780 Sign up for an IBM account www.ibm.com IBM account registration https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=120534 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=106530 https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=85282 I know GPFS Development was very busy fulfilling the CORAL requirements but maybe now there is again some time to improve something else. If there are any other users on the list that are approaching the current limitation in independent filesets, please take some time and vote for the RFEs above. Many thanks in advance and have a nice weekend. Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potent ial ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potent ial ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 79, Issue 29 ********************************************** From Stephan.Peinkofer at lrz.de Sat Aug 11 08:03:13 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Sat, 11 Aug 2018 07:03:13 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, , Message-ID: <28219001a90040d489e7269aa20fc4ae@lrz.de> Dear Marc, so at least your documentation says: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1hlp_filesfilesets.htm >>> User group and user quotas can be tracked at the file system level or per independent fileset. But obviously as a customer I don't know if that "Really" depends on independence. Currently about 70% of our filesets in the Data Science Storage systems get backed up to ISP. But that number may change over time as it depends on the requirements of our projects. For them it is just selecting "Protect this DSS Container by ISP" in a Web form an our portal then automatically does all the provisioning of the ISP Node to one of our ISP servers, rolling out the new dsm config files to the backup workers and so on. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Friday, August 10, 2018 7:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Sun Aug 12 14:05:53 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Sun, 12 Aug 2018 09:05:53 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <28219001a90040d489e7269aa20fc4ae@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de>, , <28219001a90040d489e7269aa20fc4ae@lrz.de> Message-ID: That's interesting, I confess I never read that piece of documentation. What's also interesting, is that if you look at this doc for quotas: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_change_quota_anynum_users_onproject_basis_acrs_protocols.htm The word independent appears only once in a "Note": It is recommended to create an independent fileset for the project. AND if you look at the mmchfs or mmchcr command you see: --perfileset-quota Sets the scope of user and group quota limit checks to the individual fileset level, rather than to the entire file system. With no mention of "dependent" nor "independent"... From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/11/2018 03:03 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, so at least your documentation says: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1hlp_filesfilesets.htm >>> User group and user quotas can be tracked at the file system level or per independent fileset. But obviously as a customer I don't know if that "Really" depends on independence. Currently about 70% of our filesets in the Data Science Storage systems get backed up to ISP. But that number may change over time as it depends on the requirements of our projects. For them it is just selecting "Protect this DSS Container by ISP" in a Web form an our portal then automatically does all the provisioning of the ISP Node to one of our ISP servers, rolling out the new dsm config files to the backup workers and so on. Best Regards, Stephan Peinkofer From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Friday, August 10, 2018 7:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Aug 13 07:10:04 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 13 Aug 2018 08:10:04 +0200 Subject: [gpfsug-discuss] Top files on GPFS filesystem In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 5698 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 360 bytes Desc: not available URL: From Stephan.Peinkofer at lrz.de Mon Aug 13 08:26:00 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Mon, 13 Aug 2018 07:26:00 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> <28219001a90040d489e7269aa20fc4ae@lrz.de> Message-ID: <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> Dear Marc, OK, so let?s give it a try: [root at datdsst100 pr74qo]# mmlsfileset dsstestfs01 Filesets in file system 'dsstestfs01': Name Status Path root Linked /dss/dsstestfs01 ... quota_test_independent Linked /dss/dsstestfs01/quota_test_independent quota_test_dependent Linked /dss/dsstestfs01/quota_test_independent/quota_test_dependent [root at datdsst100 pr74qo]# mmsetquota dsstestfs01:quota_test_independent --user a2822bp --block 1G:1G --files 10:10 [root at datdsst100 pr74qo]# mmsetquota dsstestfs01:quota_test_dependent --user a2822bp --block 10G:10G --files 100:100 [root at datdsst100 pr74qo]# mmrepquota -u -v dsstestfs01:quota_test_independent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_independent USR 0 1048576 1048576 0 none | 0 10 10 0 none e root quota_test_independent USR 0 0 0 0 none | 1 0 0 0 none i [root at datdsst100 pr74qo]# mmrepquota -u -v dsstestfs01:quota_test_dependent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_dependent USR 0 10485760 10485760 0 none | 0 100 100 0 none e root quota_test_dependent USR 0 0 0 0 none | 1 0 0 0 none i Looks good ? [root at datdsst100 pr74qo]# cd /dss/dsstestfs01/quota_test_independent/quota_test_dependent/ [root at datdsst100 quota_test_dependent]# for foo in `seq 1 99`; do touch file${foo}; chown a2822bp:pr28fa file${foo}; done [root at datdsst100 quota_test_dependent]# mmrepquota -u -v dsstestfs01:quota_test_dependent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_dependent USR 0 10485760 10485760 0 none | 99 100 100 0 none e root quota_test_dependent USR 0 0 0 0 none | 1 0 0 0 none i [root at datdsst100 quota_test_dependent]# mmrepquota -u -v dsstestfs01:quota_test_independent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_independent USR 0 1048576 1048576 0 none | 0 10 10 0 none e root quota_test_independent USR 0 0 0 0 none | 1 0 0 0 none i So it seems that per fileset per user quota is really not depending on independence. But what is the documentation then meaning with: >>> User group and user quotas can be tracked at the file system level or per independent fileset. ??? However, there still remains the problem with mmbackup and mmapplypolicy ? And if you look at some of the RFEs, like the one from DESY, they want even more than 10k independent filesets ? Best Regards, Stephan Peinkofer -- Stephan Peinkofer Dipl. Inf. (FH), M. Sc. (TUM) Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen Tel: +49(0)89 35831-8715 Fax: +49(0)89 35831-9700 URL: http://www.lrz.de On 12. Aug 2018, at 15:05, Marc A Kaplan > wrote: That's interesting, I confess I never read that piece of documentation. What's also interesting, is that if you look at this doc for quotas: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_change_quota_anynum_users_onproject_basis_acrs_protocols.htm The word independent appears only once in a "Note": It is recommended to create an independent fileset for the project. AND if you look at the mmchfs or mmchcr command you see: --perfileset-quota Sets the scope of user and group quota limit checks to the individual fileset level, rather than to the entire file system. With no mention of "dependent" nor "independent"... From: "Peinkofer, Stephan" > To: gpfsug main discussion list > Date: 08/11/2018 03:03 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear Marc, so at least your documentation says: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1hlp_filesfilesets.htm >>> User group and user quotas can be tracked at the file system level or per independent fileset. But obviously as a customer I don't know if that "Really" depends on independence. Currently about 70% of our filesets in the Data Science Storage systems get backed up to ISP. But that number may change over time as it depends on the requirements of our projects. For them it is just selecting "Protect this DSS Container by ISP" in a Web form an our portal then automatically does all the provisioning of the ISP Node to one of our ISP servers, rolling out the new dsm config files to the backup workers and so on. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org > on behalf of Marc A Kaplan > Sent: Friday, August 10, 2018 7:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Aug 13 08:52:55 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 13 Aug 2018 09:52:55 +0200 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de> <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Aug 13 16:12:32 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 13 Aug 2018 11:12:32 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de> <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> Message-ID: If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... Like many things in life, sometimes compromises are necessary! From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/13/2018 03:26 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, OK, so let?s give it a try: [root at datdsst100 pr74qo]# mmlsfileset dsstestfs01 Filesets in file system 'dsstestfs01': Name Status Path root Linked /dss/dsstestfs01 ... quota_test_independent Linked /dss/dsstestfs01/quota_test_independent quota_test_dependent Linked /dss/dsstestfs01/quota_test_independent/quota_test_dependent [root at datdsst100 pr74qo]# mmsetquota dsstestfs01:quota_test_independent --user a2822bp --block 1G:1G --files 10:10 [root at datdsst100 pr74qo]# mmsetquota dsstestfs01:quota_test_dependent --user a2822bp --block 10G:10G --files 100:100 [root at datdsst100 pr74qo]# mmrepquota -u -v dsstestfs01:quota_test_independent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_independent USR 0 1048576 1048576 0 none | 0 10 10 0 none e root quota_test_independent USR 0 0 0 0 none | 1 0 0 0 none i [root at datdsst100 pr74qo]# mmrepquota -u -v dsstestfs01:quota_test_dependent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_dependent USR 0 10485760 10485760 0 none | 0 100 100 0 none e root quota_test_dependent USR 0 0 0 0 none | 1 0 0 0 none i Looks good ? [root at datdsst100 pr74qo]# cd /dss/dsstestfs01/quota_test_independent/quota_test_dependent/ [root at datdsst100 quota_test_dependent]# for foo in `seq 1 99`; do touch file${foo}; chown a2822bp:pr28fa file${foo}; done [root at datdsst100 quota_test_dependent]# mmrepquota -u -v dsstestfs01:quota_test_dependent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_dependent USR 0 10485760 10485760 0 none | 99 100 100 0 none e root quota_test_dependent USR 0 0 0 0 none | 1 0 0 0 none i [root at datdsst100 quota_test_dependent]# mmrepquota -u -v dsstestfs01:quota_test_independent *** Report for USR quotas on dsstestfs01 Block Limits | File Limits Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace entryType a2822bp quota_test_independent USR 0 1048576 1048576 0 none | 0 10 10 0 none e root quota_test_independent USR 0 0 0 0 none | 1 0 0 0 none i So it seems that per fileset per user quota is really not depending on independence. But what is the documentation then meaning with: >>> User group and user quotas can be tracked at the file system level or per independent fileset. ??? However, there still remains the problem with mmbackup and mmapplypolicy ? And if you look at some of the RFEs, like the one from DESY, they want even more than 10k independent filesets ? Best Regards, Stephan Peinkofer -- Stephan Peinkofer Dipl. Inf. (FH), M. Sc. (TUM) Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen Tel: +49(0)89 35831-8715 Fax: +49(0)89 35831-9700 URL: http://www.lrz.de On 12. Aug 2018, at 15:05, Marc A Kaplan wrote: That's interesting, I confess I never read that piece of documentation. What's also interesting, is that if you look at this doc for quotas: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_change_quota_anynum_users_onproject_basis_acrs_protocols.htm The word independent appears only once in a "Note": It is recommended to create an independent fileset for the project. AND if you look at the mmchfs or mmchcr command you see: --perfileset-quota Sets the scope of user and group quota limit checks to the individual fileset level, rather than to the entire file system. With no mention of "dependent" nor "independent"... From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/11/2018 03:03 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, so at least your documentation says: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1hlp_filesfilesets.htm >>> User group and user quotas can be tracked at the file system level or per independent fileset. But obviously as a customer I don't know if that "Really" depends on independence. Currently about 70% of our filesets in the Data Science Storage systems get backed up to ISP. But that number may change over time as it depends on the requirements of our projects. For them it is just selecting "Protect this DSS Container by ISP" in a Web form an our portal then automatically does all the provisioning of the ISP Node to one of our ISP servers, rolling out the new dsm config files to the backup workers and so on. Best Regards, Stephan Peinkofer From: gpfsug-discuss-bounces at spectrumscale.org < gpfsug-discuss-bounces at spectrumscale.org> on behalf of Marc A Kaplan < makaplan at us.ibm.com> Sent: Friday, August 10, 2018 7:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit I know quota stuff was cooked into GPFS before we even had "independent filesets"... So which particular quota features or commands or options now depend on "independence"?! Really? Yes, independent fileset performance for mmapplypolicy and mmbackup scales with the inodespace sizes. But I'm curious to know how many of those indy filesets are mmback-ed-up. Appreciate your elaborations, 'cause even though I've worked on some of this code, I don't know how/when/if customers push which limits. --------------------- Dear Marc, well the primary reasons for us are: - Per fileset quota (this seems to work also for dependent filesets as far as I know) - Per user per fileset quota (this seems only to work for independent filesets) - The dedicated inode space to speedup mmpolicy runs which only have to be applied to a specific subpart of the file system - Scaling mmbackup by backing up different filesets to different TSM Servers economically We have currently more than 1000 projects on our HPC machines and several different existing and planned file systems (use cases): _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Aug 13 19:48:20 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]) Date: Mon, 13 Aug 2018 18:48:20 +0000 Subject: [gpfsug-discuss] TCP_QUICKACK Message-ID: <024BF8AB-B747-4EE3-82C9-A746190F99A5@nasa.gov> This is a question mostly for the devs. but really for anyone who can answer. Does GPFS use the TCP_QUICKACK socket flag on Linux? I?m debugging an IPoIB problem exacerbated by GPFS and based on the packet captures it seems as though the answer might be yes, but I?m curious if GPFS is explicitly doing this or if there?s just a timing window in the RPC behavior that just makes it look that way. -Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Aug 13 20:25:44 2018 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 13 Aug 2018 15:25:44 -0400 Subject: [gpfsug-discuss] TCP_QUICKACK In-Reply-To: <024BF8AB-B747-4EE3-82C9-A746190F99A5@nasa.gov> References: <024BF8AB-B747-4EE3-82C9-A746190F99A5@nasa.gov> Message-ID: Hi Aaron, I just searched the core GPFS source code. I didn't find TCP_QUICKACK being used explicitly. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]" To: gpfsug main discussion list Date: 08/13/2018 02:48 PM Subject: [gpfsug-discuss] TCP_QUICKACK Sent by: gpfsug-discuss-bounces at spectrumscale.org This is a question mostly for the devs. but really for anyone who can answer. Does GPFS use the TCP_QUICKACK socket flag on Linux? I?m debugging an IPoIB problem exacerbated by GPFS and based on the packet captures it seems as though the answer might be yes, but I?m curious if GPFS is explicitly doing this or if there?s just a timing window in the RPC behavior that just makes it look that way. -Aaron _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kkr at lbl.gov Tue Aug 14 01:09:24 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 13 Aug 2018 17:09:24 -0700 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 In-Reply-To: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> References: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> Message-ID: <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov> All, don?t forget registration ends on the early side for this event due to background checks, etc. As noted below: IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Hope you?ll be able to attend! Best, Kristy > On Aug 3, 2018, at 12:37 PM, Kristy Kallback-Rose wrote: > > All, > > Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: > ? the draft agenda (bottom of page), > ? a link to registration, register by September 1 due to ORNL site requirements (see next line) > ? an important note about registration requirements for going to Oak Ridge National Lab > ? a request for your site presentations > ? information about HPCXXL and who to contact for information about joining, and > ? other upcoming events. > > Hope you can attend and see Summit and Alpine first hand. > > Best, > Kristy > > Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 > > IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. > > ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. > > About HPCXXL: > HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. > The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. > To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. > > Other upcoming GPFS/SS events: > Sep 19+20 HPCXXL, Oak Ridge > Aug 10 Meetup along TechU, Sydney > Oct 24 NYC User Meeting, New York > Nov 11 SC, Dallas > Dec 12 CIUK, Manchester > > > Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ > Duration Start End Title > > Wednesday 19th, 2018 > > Speaker > > TBD > Chris Maestas (IBM) TBD (IBM) > TBD (IBM) > John Lewars (IBM) > > *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) > John Lewars (IBM) > > Carl Zetie (IBM) TBD > > TBD (ORNL) > TBD (IBM) > William Godoy (ORNL) Ted Hoover (IBM) > > Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All > > 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 > > 13:15 Welcome > 13:45 What is new in Spectrum Scale? > 14:00 What is new in ESS? > 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === > 15:40 AWE > 16:00 CSCS site report > 16:20 Starfish (Sponsor talk) > 16:50 Network Flow > 17:20 RFEs > 17:30 W rap-up > > Thursday 19th, 2018 > > 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 > > 08:50 Alpine ? the Summit file system > 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library > 10:00 AI Reference Architecture > 10:30 === BREAK === > 11:00 Encryption on the wire and on rest 11:30 Service Update > 12:00 Open Forum > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Peinkofer at lrz.de Tue Aug 14 05:50:43 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Tue, 14 Aug 2018 04:50:43 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> <28219001a90040d489e7269aa20fc4ae@lrz.de> <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> Message-ID: <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Aug 14 07:08:55 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 14 Aug 2018 06:08:55 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de> <28219001a90040d489e7269aa20fc4ae@lrz.de> <75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> , <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> Message-ID: <4830FF9B-A443-4508-A8ED-B023B6EDD15C@huk-coburg.de> +1 great answer Stephan. We also dont understand why funktions are existend, but every time we want to use it, the first step is make a requirement. Von meinem iPhone gesendet Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Am 14.08.2018 um 06:50 schrieb Peinkofer, Stephan >: Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Aug 14 16:31:15 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 14 Aug 2018 11:31:15 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de><75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de> Message-ID: True, mmbackup is designed to work best backing up either a single independent fileset or the entire file system. So if you know some filesets do not need to be backed up, map them to one or more indepedent filesets that will not be backed up. mmapplypolicy is happy to scan a single dependent fileset, use option --scope fileset and make the primary argument the path to the root of the fileset you wish to scan. The overhead is not simply described. The directory scan phase will explore or walk the (sub)tree in parallel with multiple threads on multiple nodes, reading just the directory blocks that need to be read. The inodescan phase will read blocks of inodes from the given inodespace ... since the inodes of dependent filesets may be "mixed" into the same blocks as other dependend filesets that are in the same independent fileset, mmapplypolicy will incur what you might consider "extra" overhead. From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/14/2018 12:50 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Aug 15 12:07:45 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 15 Aug 2018 11:07:45 +0000 Subject: [gpfsug-discuss] 5.0.1-2 release? Message-ID: Does anyone know if 5.0.1-2 is actually going to hit fix central? The release alert went out yesterday but the product is still not available. We've been waiting on it for a couple of weeks to fix an issue (we weren't offered an efix and were originally told it was due last week). Due to ongoing issues hitting some of our clients, we've had to take them out of service.... Related to fix central, is anyone else having issues with entitlement to download? We have DME licenses and can download standard edition 5.0, DME 4.2.3 but not DME 5.0.1... I got this fixed for my account, but others in my organisation/customer number don't seem to have access... Just wondering if this is just us, or others are having similar issues? Thanks Simon From r.sobey at imperial.ac.uk Wed Aug 15 13:56:28 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 15 Aug 2018 12:56:28 +0000 Subject: [gpfsug-discuss] 5.0.1 and HSM Message-ID: Hi all, Is anyone running HSM who has also upgraded to 5.0.1? I'd be interested to know if it work(s) or if you had to downgrade back to 5.0.0.X or even 4.2.3.X. Officially the website says not supported, but we've been told (not verbatim) there's no reason why it wouldn't. We really don't want to have to upgrade to a Scale 5 release that's already not receiving any more PTFs but we may have to. Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Aug 15 14:00:18 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 15 Aug 2018 13:00:18 +0000 Subject: [gpfsug-discuss] 5.0.1-2 release? In-Reply-To: References: Message-ID: Sorry, was able to download 5.0.1.1 DME just now, no issues. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson Sent: 15 August 2018 12:08 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] 5.0.1-2 release? Does anyone know if 5.0.1-2 is actually going to hit fix central? The release alert went out yesterday but the product is still not available. We've been waiting on it for a couple of weeks to fix an issue (we weren't offered an efix and were originally told it was due last week). Due to ongoing issues hitting some of our clients, we've had to take them out of service.... Related to fix central, is anyone else having issues with entitlement to download? We have DME licenses and can download standard edition 5.0, DME 4.2.3 but not DME 5.0.1... I got this fixed for my account, but others in my organisation/customer number don't seem to have access... Just wondering if this is just us, or others are having similar issues? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Robert.Oesterlin at nuance.com Wed Aug 15 19:37:50 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 15 Aug 2018 18:37:50 +0000 Subject: [gpfsug-discuss] 5.0.1-2 release? Message-ID: <65E22DAC-1FCE-424D-BE95-4C0D841194E1@nuance.com> 5.0.1.2 is now on Fix Central. Bob Oesterlin Sr Principal Storage Engineer, Nuance ?On 8/15/18, 6:07 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson" wrote: Does anyone know if 5.0.1-2 is actually going to hit fix central? The release alert went out yesterday but the product is still not available. We've been waiting on it for a couple of weeks to fix an issue (we weren't offered an efix and were originally told it was due last week). Due to ongoing issues hitting some of our clients, we've had to take them out of service.... Related to fix central, is anyone else having issues with entitlement to download? We have DME licenses and can download standard edition 5.0, DME 4.2.3 but not DME 5.0.1... I got this fixed for my account, but others in my organisation/customer number don't seem to have access... Just wondering if this is just us, or others are having similar issues? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=OYGVn5hlqVYT-aqb8EERr85EEm8p19iHHWkSpX7AeKc&s=91moEFA-0zhZicJFFWDd4iO2Wt7GhhuaDi6yvZqigrI&e= From carlz at us.ibm.com Thu Aug 16 13:28:22 2018 From: carlz at us.ibm.com (Carl Zetie) Date: Thu, 16 Aug 2018 12:28:22 +0000 Subject: [gpfsug-discuss] Entitlements issues in Fix Central Message-ID: So... who wants to help us fix Fix Central? Two things: 1. I have seen a handful of issues in the last two weeks similar to what Simon and others have described: some versions of Scale download fine, others not. Some user IDs work, some get denied. And there is no obvious pattern or cause. We are looking at it, and more data points will help us track it down, so it would be a big help if everybody who encounters this reported it to Fix Central support: https://www.ibm.com/support/home/?lnk=fcw 2. An internal project is kicking off to improve Fix Central and Passport Advantage. If anybody would like to be a sponsor user in that project, contact me off-list. I can't guarantee participation, but I would love to get a couple of Scale users into the process. thanks, Carl Zetie Offering Manager for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From Dwayne.Hart at med.mun.ca Thu Aug 16 13:35:54 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Thu, 16 Aug 2018 12:35:54 +0000 Subject: [gpfsug-discuss] Entitlements issues in Fix Central In-Reply-To: References: Message-ID: <81C9FEC6-6BCF-433B-BEDB-B32A9B1A63B0@med.mun.ca> Hi Carl, I have access to both Fix Central and Passport Advantage. I?d like to assist in anyway I can. Best, Dwayne ? Dwayne Hart | Systems Administrator IV CHIA, Faculty of Medicine Memorial University of Newfoundland 300 Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L Dobbin Building | 4M409 T 709 864 6631 > On Aug 16, 2018, at 9:58 AM, Carl Zetie wrote: > > > So... who wants to help us fix Fix Central? > > Two things: > > 1. I have seen a handful of issues in the last two weeks similar to what Simon and others have described: some versions of Scale download fine, others not. Some user IDs work, some get denied. And there is no obvious pattern or cause. We are looking at it, and more data points will help us track it down, so it would be a big help if everybody who encounters this reported it to Fix Central support: > > https://www.ibm.com/support/home/?lnk=fcw > > > 2. An internal project is kicking off to improve Fix Central and Passport Advantage. If anybody would like to be a sponsor user in that project, contact me off-list. I can't guarantee participation, but I would love to get a couple of Scale users into the process. > > thanks, > > > > > > > > > > > > > > Carl Zetie > Offering Manager for Spectrum Scale, IBM > ---- > (540) 882 9353 ][ Research Triangle Park > carlz at us.ibm.com > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Stephan.Peinkofer at lrz.de Fri Aug 17 12:39:54 2018 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Fri, 17 Aug 2018 11:39:54 +0000 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de><75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de> <65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de>, Message-ID: Dear Marc, well as I think I cannot simply "move" dependent filesets between independent ones and our customers must have the opportunity to change data protection policy for their Containers at any given time, I cannot map them to a "backed up" or "not backed up" independent fileset. So how much performance impact is lets say 1-10 exclude.dir directives per independent fileset? Many thanks in advance. Best Regards, Stephan Peinkofer ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Tuesday, August 14, 2018 5:31 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? True, mmbackup is designed to work best backing up either a single independent fileset or the entire file system. So if you know some filesets do not need to be backed up, map them to one or more indepedent filesets that will not be backed up. mmapplypolicy is happy to scan a single dependent fileset, use option --scope fileset and make the primary argument the path to the root of the fileset you wish to scan. The overhead is not simply described. The directory scan phase will explore or walk the (sub)tree in parallel with multiple threads on multiple nodes, reading just the directory blocks that need to be read. The inodescan phase will read blocks of inodes from the given inodespace ... since the inodes of dependent filesets may be "mixed" into the same blocks as other dependend filesets that are in the same independent fileset, mmapplypolicy will incur what you might consider "extra" overhead. From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/14/2018 12:50 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Fri Aug 17 12:59:56 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 17 Aug 2018 07:59:56 -0400 Subject: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? In-Reply-To: References: <298030c14ce94fae8f21aefe9d736b84@lrz.de><28219001a90040d489e7269aa20fc4ae@lrz.de><75F43E7B-170F-47A7-8356-2FEC4C2D5AF3@lrz.de><65F6CC6E-A69D-4779-96EF-08EE5E23AC64@lrz.de>, Message-ID: My idea, not completely thought out, is that before you hit the 1000 limit, you start putting new customers or projects into dependent filesets and define those new dependent filesets within either a lesser number of independent filesets expressly created to receive the new customers OR perhaps even lump them with already existing independent filesets that have matching backup requirements. I would NOT try to create backups for each dependent fileset. But stick with the supported facilities to manage backup per independent... Having said that, if you'd still like to do backup per dependent fileset -- then have at it -- but test, test and retest.... And measure performance... My GUESS is that IF you can hack mmbackup or similar to use mmapplypolicy /path-to-dependent-fileset --scope fileset .... instead of mmapplypolicy /path-to-independent-fileset --scope inodespace .... You'll be okay because the inodescan where you end up reading some extra inodes is probably a tiny fraction of all the other IO you'll be doing! BUT I don't think IBM is in a position to encourage you to hack mmbackup -- it's already very complicated! From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/17/2018 07:40 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, well as I think I cannot simply "move" dependent filesets between independent ones and our customers must have the opportunity to change data protection policy for their Containers at any given time, I cannot map them to a "backed up" or "not backed up" independent fileset. So how much performance impact is lets say 1-10 exclude.dir directives per independent fileset? Many thanks in advance. Best Regards, Stephan Peinkofer From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Marc A Kaplan Sent: Tuesday, August 14, 2018 5:31 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? True, mmbackup is designed to work best backing up either a single independent fileset or the entire file system. So if you know some filesets do not need to be backed up, map them to one or more indepedent filesets that will not be backed up. mmapplypolicy is happy to scan a single dependent fileset, use option --scope fileset and make the primary argument the path to the root of the fileset you wish to scan. The overhead is not simply described. The directory scan phase will explore or walk the (sub)tree in parallel with multiple threads on multiple nodes, reading just the directory blocks that need to be read. The inodescan phase will read blocks of inodes from the given inodespace ... since the inodes of dependent filesets may be "mixed" into the same blocks as other dependend filesets that are in the same independent fileset, mmapplypolicy will incur what you might consider "extra" overhead. From: "Peinkofer, Stephan" To: gpfsug main discussion list Date: 08/14/2018 12:50 AM Subject: Re: [gpfsug-discuss] GPFS Independent Fileset Limit vs Quotas? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Marc, If you "must" exceed 1000 filesets because you are assigning each project to its own fileset, my suggestion is this: Yes, there are scaling/performance/manageability benefits to using mmbackup over independent filesets. But maybe you don't need 10,000 independent filesets -- maybe you can hash or otherwise randomly assign projects that each have their own (dependent) fileset name to a lesser number of independent filesets that will serve as management groups for (mm)backup... OK, if that might be doable, whats then the performance impact of having to specify Include/Exclude lists for each independent fileset in order to specify which dependent fileset should be backed up and which one not? I don?t remember exactly, but I think I?ve heard at some time, that Include/Exclude and mmbackup have to be used with caution. And the same question holds true for running mmapplypolicy for a ?job? on a single dependent fileset? Is the scan runtime linear to the size of the underlying independent fileset or are there some optimisations when I just want to scan a subfolder/dependent fileset of an independent one? Like many things in life, sometimes compromises are necessary! Hmm, can I reference this next time, when we negotiate Scale License pricing with the ISS sales people? ;) Best Regards, Stephan Peinkofer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Sat Aug 18 03:34:30 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 17 Aug 2018 22:34:30 -0400 Subject: [gpfsug-discuss] TCP_QUICKACK In-Reply-To: References: <024BF8AB-B747-4EE3-82C9-A746190F99A5@nasa.gov> Message-ID: <3de256a6-c8f0-3e44-baf8-3f32fb0c4a06@nasa.gov> Thanks! Appreciate the quick answer. On 8/13/18 3:25 PM, IBM Spectrum Scale wrote: > Hi Aaron, > > I just searched the core GPFS source code. I didn't find TCP_QUICKACKbeing used explicitly. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. > > The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. > > Inactive hide details for "Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]" ---08/13/2018 02:48:53 PM---This is a question mostly f"Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]" ---08/13/2018 02:48:53 PM---This is a question mostly for the devs. but really for anyone who can answer. Does GPFS use the TCP_ > > From: "Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]" > To: gpfsug main discussion list > Date: 08/13/2018 02:48 PM > Subject: [gpfsug-discuss] TCP_QUICKACK > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > This is a question mostly for the devs. but really for anyone who can answer. > > Does GPFS use the TCP_QUICKACK socket flag on Linux? > > I?m debugging an IPoIB problem exacerbated by GPFS and based on the packet captures it seems as though the answer might be yes, but I?m curious if GPFS is explicitly doing this or if there?s just a timing window in the RPC behavior that just makes it look that way. > > -Aaron > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From david_johnson at brown.edu Mon Aug 20 17:55:18 2018 From: david_johnson at brown.edu (David Johnson) Date: Mon, 20 Aug 2018 12:55:18 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P Message-ID: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> I have one storage pool that was recently doubled, and another pool migrated there using mmapplypolicy. The new half is only 50% full, and the old half is 94% full. Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) d05_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.93G ( 0%) d04_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.9G ( 0%) d03_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.12G ( 0%) d02_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.03G ( 0%) d01_george_23 50.49T 23 No Yes 25.9T ( 51%) 18.92G ( 0%) d00_george_23 50.49T 23 No Yes 25.91T ( 51%) 19.05G ( 0%) d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.35G ( 0%) d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.2G ( 0%) d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) 69.93G ( 0%) d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.11G ( 0%) d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.08G ( 0%) d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.3G ( 0%) d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.25G ( 0%) d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.28G ( 0%) ------------- -------------------- ------------------- (pool total) 706.9T 180.1T ( 25%) 675.5G ( 0%) Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks from the _cit_ NSDs to the _george_ NSDs, so that they end up all around 75% full? Thanks, ? ddj Dave Johnson Brown University CCV/CIS -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Mon Aug 20 19:02:05 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Mon, 20 Aug 2018 14:02:05 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: That should do what you want. Be aware that mmrestripefs generates significant IO load so you should either use the QoS feature to mitigate its impact or run the command when the system is not very busy. Note you have two additional NSDs in the 33 failure group than you do in the 23 failure group. You may want to change one of those NSDs in failure group 33 to be in failure group 23 so you have equal storage space in both failure groups. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: David Johnson To: gpfsug main discussion list Date: 08/20/2018 12:55 PM Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P Sent by: gpfsug-discuss-bounces at spectrumscale.org I have one storage pool that was recently doubled, and another pool migrated there using mmapplypolicy. The new half is only 50% full, and the old half is 94% full. Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) d05_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.93G ( 0%) d04_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.9G ( 0%) d03_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.12G ( 0%) d02_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.03G ( 0%) d01_george_23 50.49T 23 No Yes 25.9T ( 51%) 18.92G ( 0%) d00_george_23 50.49T 23 No Yes 25.91T ( 51%) 19.05G ( 0%) d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.35G ( 0%) d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.2G ( 0%) d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) 69.93G ( 0%) d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.11G ( 0%) d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.08G ( 0%) d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.3G ( 0%) d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.25G ( 0%) d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.28G ( 0%) ------------- -------------------- ------------------- (pool total) 706.9T 180.1T ( 25%) 675.5G ( 0%) Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks from the _cit_ NSDs to the _george_ NSDs, so that they end up all around 75% full? Thanks, ? ddj Dave Johnson Brown University CCV/CIS_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Mon Aug 20 19:06:23 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Mon, 20 Aug 2018 14:06:23 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: Does anyone have a good rule of thumb for iops to allow for background QOS tasks? -- ddj Dave Johnson > On Aug 20, 2018, at 2:02 PM, Frederick Stock wrote: > > That should do what you want. Be aware that mmrestripefs generates significant IO load so you should either use the QoS feature to mitigate its impact or run the command when the system is not very busy. > > Note you have two additional NSDs in the 33 failure group than you do in the 23 failure group. You may want to change one of those NSDs in failure group 33 to be in failure group 23 so you have equal storage space in both failure groups. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > From: David Johnson > To: gpfsug main discussion list > Date: 08/20/2018 12:55 PM > Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > I have one storage pool that was recently doubled, and another pool migrated there using mmapplypolicy. > The new half is only 50% full, and the old half is 94% full. > > Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) > d05_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.93G ( 0%) > d04_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.9G ( 0%) > d03_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.12G ( 0%) > d02_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.03G ( 0%) > d01_george_23 50.49T 23 No Yes 25.9T ( 51%) 18.92G ( 0%) > d00_george_23 50.49T 23 No Yes 25.91T ( 51%) 19.05G ( 0%) > d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.35G ( 0%) > d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.2G ( 0%) > d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) 69.93G ( 0%) > d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.11G ( 0%) > d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.08G ( 0%) > d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.3G ( 0%) > d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.25G ( 0%) > d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.28G ( 0%) > ------------- -------------------- ------------------- > (pool total) 706.9T 180.1T ( 25%) 675.5G ( 0%) > > Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks from the _cit_ NSDs to the _george_ NSDs, > so that they end up all around 75% full? > > Thanks, > ? ddj > Dave Johnson > Brown University CCV/CIS_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Mon Aug 20 19:13:51 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Mon, 20 Aug 2018 11:13:51 -0700 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: Hey Dave, Can you say more about what you are trying to accomplish by doing the rebalance? IME, the performance hit from running the rebalance was higher than the performance hit from writes being directed to a subset of the disks. If you have any churn of the data, eventually they will rebalance anyway. Regards, Alex On Mon, Aug 20, 2018 at 11:06 AM wrote: > Does anyone have a good rule of thumb for iops to allow for background QOS > tasks? > > > > -- ddj > Dave Johnson > > On Aug 20, 2018, at 2:02 PM, Frederick Stock wrote: > > That should do what you want. Be aware that mmrestripefs generates > significant IO load so you should either use the QoS feature to mitigate > its impact or run the command when the system is not very busy. > > Note you have two additional NSDs in the 33 failure group than you do in > the 23 failure group. You may want to change one of those NSDs in failure > group 33 to be in failure group 23 so you have equal storage space in both > failure groups. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > From: David Johnson > To: gpfsug main discussion list > Date: 08/20/2018 12:55 PM > Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > I have one storage pool that was recently doubled, and another pool > migrated there using mmapplypolicy. > The new half is only 50% full, and the old half is 94% full. > > Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) > d05_george_23 50.49T 23 No Yes 25.91T ( 51%) > 18.93G ( 0%) > d04_george_23 50.49T 23 No Yes 25.91T ( 51%) > 18.9G ( 0%) > d03_george_23 50.49T 23 No Yes 25.9T ( 51%) > 19.12G ( 0%) > d02_george_23 50.49T 23 No Yes 25.9T ( 51%) > 19.03G ( 0%) > d01_george_23 50.49T 23 No Yes 25.9T ( 51%) > 18.92G ( 0%) > d00_george_23 50.49T 23 No Yes 25.91T ( 51%) > 19.05G ( 0%) > d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) > 70.35G ( 0%) > d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) > 70.2G ( 0%) > d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) > 69.93G ( 0%) > d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) > 70.11G ( 0%) > d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) > 70.08G ( 0%) > d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) > 70.3G ( 0%) > d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) > 70.25G ( 0%) > d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) > 70.28G ( 0%) > ------------- -------------------- > ------------------- > (pool total) 706.9T 180.1T ( 25%) > 675.5G ( 0%) > > Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks > from the _cit_ NSDs to the _george_ NSDs, > so that they end up all around 75% full? > > Thanks, > ? ddj > Dave Johnson > Brown University CCV/CIS_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Mon Aug 20 23:08:28 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 20 Aug 2018 18:08:28 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: <23047.1534802908@turing-police.cc.vt.edu> On Mon, 20 Aug 2018 14:02:05 -0400, "Frederick Stock" said: > Note you have two additional NSDs in the 33 failure group than you do in > the 23 failure group. You may want to change one of those NSDs in failure > group 33 to be in failure group 23 so you have equal storage space in both > failure groups. Keep in mind that the failure groups should be built up based on single points of failure. In other words, a failure group should consist of disks that will all stay up or all go down on the same failure (controller, network, whatever). Looking at the fact that you have 6 disks named 'dNN_george_33' and 8 named 'dNN_cit_33', it sounds very likely that they are in two different storage arrays, and you should make your failure groups so they don't span a storage array. In other words, taking a 'cit' disk and moving it into a 'george' failure group will Do The Wrong Thing, because if you do data replication, one copy can go onto a 'george' disk, and the other onto a 'cit' disk that's in the same array as the 'george' disk. If 'george' fails, you lose access to both replicas. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From david_johnson at brown.edu Mon Aug 20 23:21:08 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Mon, 20 Aug 2018 18:21:08 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: <23047.1534802908@turing-police.cc.vt.edu> References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> <23047.1534802908@turing-police.cc.vt.edu> Message-ID: Yes the arrays are in different buildings. We want to spread the activity over more servers if possible but recognize the extra load that rebalancing would entail. The system is busy all the time. I have considered using QOS when we run policy migrations but haven?t yet because I don?t know what value to allow for throttling IOPS. We need to do weekly migrations off of 15k rpm pool onto 7.2k rpm pool, and previously I?ve just let it run at native speed. I?d like to know what other folks have used for QOS settings. I think we may leave things alone for now regarding the original question, rebalancing this pool. -- ddj Dave Johnson > On Aug 20, 2018, at 6:08 PM, valdis.kletnieks at vt.edu wrote: > > On Mon, 20 Aug 2018 14:02:05 -0400, "Frederick Stock" said: > >> Note you have two additional NSDs in the 33 failure group than you do in >> the 23 failure group. You may want to change one of those NSDs in failure >> group 33 to be in failure group 23 so you have equal storage space in both >> failure groups. > > Keep in mind that the failure groups should be built up based on single points of failure. > In other words, a failure group should consist of disks that will all stay up or all go down on > the same failure (controller, network, whatever). > > Looking at the fact that you have 6 disks named 'dNN_george_33' and 8 named 'dNN_cit_33', > it sounds very likely that they are in two different storage arrays, and you should make your > failure groups so they don't span a storage array. In other words, taking a 'cit' disk > and moving it into a 'george' failure group will Do The Wrong Thing, because if you do > data replication, one copy can go onto a 'george' disk, and the other onto a 'cit' disk > that's in the same array as the 'george' disk. If 'george' fails, you lose access to both > replicas. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From aaron.s.knister at nasa.gov Tue Aug 21 01:05:07 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]) Date: Tue, 21 Aug 2018 00:05:07 +0000 Subject: [gpfsug-discuss] fcntl ENOTTY Message-ID: <2DAB9816-7DEE-4890-9045-489692D2BA6A@nasa.gov> Nothing worse than a vague question with little context, eh? Well... Does anyone know why GPFS might return ENOTTY to an fcntl(fd, F_SETLKW, &lock) where lock.l_type is set to F_RDLCK? The error prompting this question looks almost identical to the one in this (unfortunately unanswered) thread: http://www.spectrumscale.org/pipermail/gpfsug-discuss/2014-June/000412.html -Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Tue Aug 21 04:28:19 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 20 Aug 2018 23:28:19 -0400 Subject: [gpfsug-discuss] fcntl ENOTTY In-Reply-To: <2DAB9816-7DEE-4890-9045-489692D2BA6A@nasa.gov> References: <2DAB9816-7DEE-4890-9045-489692D2BA6A@nasa.gov> Message-ID: <5e34373c-d6ff-fca7-4254-64958f636b69@nasa.gov> Argh... Please disregard (I think). Apparently, mpich uses "%X" to format errno (oh yeah, sure, why not use %p to print strings while we're at it) which means that the errno is *actually* 37 which is ENOLCK. Ok, now there's something I can work with. -Aaron p.s. I'm sure that formatting errno with %X made sense at the time (ok, no I'm not), but it sent me down a hell of a rabbit hole and I'm just bitter. No offense intended. On 8/20/18 8:05 PM, Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC] wrote: > Nothing worse than a vague question with little context, eh? Well... > > Does anyone know why GPFS might return ENOTTY to an fcntl(fd, F_SETLKW, &lock) where lock.l_type is set to F_RDLCK? > > The error prompting this question looks almost identical to the one in this (unfortunately unanswered) thread: > > http://www.spectrumscale.org/pipermail/gpfsug-discuss/2014-June/000412.html > > -Aaron > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From luis.bolinches at fi.ibm.com Tue Aug 21 05:11:24 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Tue, 21 Aug 2018 04:11:24 +0000 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: Message-ID: Hi You can enable QoS first to see the activity while on inf value to see the current values of usage and set the li is later on. Those limits are modificable online so even in case you have (not your case it seems) less activity times those can be increased for replication then and Lowe again on peak times. ? SENT FROM MOBILE DEVICE Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous > On 21 Aug 2018, at 1.21, david_johnson at brown.edu wrote: > > Yes the arrays are in different buildings. We want to spread the activity over more servers if possible but recognize the extra load that rebalancing would entail. The system is busy all the time. > > I have considered using QOS when we run policy migrations but haven?t yet because I don?t know what value to allow for throttling IOPS. We need to do weekly migrations off of 15k rpm pool onto 7.2k rpm pool, and previously I?ve just let it run at native speed. I?d like to know what other folks have used for QOS settings. > > I think we may leave things alone for now regarding the original question, rebalancing this pool. > > -- ddj > Dave Johnson > >> On Aug 20, 2018, at 6:08 PM, valdis.kletnieks at vt.edu wrote: >> >> On Mon, 20 Aug 2018 14:02:05 -0400, "Frederick Stock" said: >> >>> Note you have two additional NSDs in the 33 failure group than you do in >>> the 23 failure group. You may want to change one of those NSDs in failure >>> group 33 to be in failure group 23 so you have equal storage space in both >>> failure groups. >> >> Keep in mind that the failure groups should be built up based on single points of failure. >> In other words, a failure group should consist of disks that will all stay up or all go down on >> the same failure (controller, network, whatever). >> >> Looking at the fact that you have 6 disks named 'dNN_george_33' and 8 named 'dNN_cit_33', >> it sounds very likely that they are in two different storage arrays, and you should make your >> failure groups so they don't span a storage array. In other words, taking a 'cit' disk >> and moving it into a 'george' failure group will Do The Wrong Thing, because if you do >> data replication, one copy can go onto a 'george' disk, and the other onto a 'cit' disk >> that's in the same array as the 'george' disk. If 'george' fails, you lose access to both >> replicas. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Tue Aug 21 15:48:15 2018 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Tue, 21 Aug 2018 14:48:15 +0000 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data In-Reply-To: References: <83A6EEB0EC738F459A39439733AE80452672ADC8@MBX114.d.ethz.ch>, Message-ID: <83A6EEB0EC738F459A39439733AE804526743F1B@MBX114.d.ethz.ch> More precisely the problem is the following: If I set period=1 for a "rate" sensor (network speed, NSD read/write speed, PDisk read/write speed) everything is correct because every second the sensors get the valuess of the cumulative counters (and do not divide it by 1, which is not affecting anything for 1 second). If I set the period=2, the "rate" sensors collect the values from the cumulative counters every two seconds but they do not divide by 2 those values (because pmsensors do not actually divide; they seem to silly report what they read which is understand-able from a performance point of view); then grafana receives as double as the real speed. I've to correct myself: here the point is not how sampling/downsampling is done by grafana/grafana-bridge/whatever as I wrongly wrote in my first email. The point is: if I collect data every N seconds (because I do not want to overloads the pmcollector node), how can I divide (in grafana) the reported collected data by N to get real avg speed in that N-seconds time interval ?? At the moment it seems that the only option is using N=1, which is bad because, as I stated, it overloads the collector when many nodes run many pmsensors... A ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of IBM Spectrum Scale [scale at us.ibm.com] Sent: Friday, July 27, 2018 8:27 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] How Zimon/Grafana-bridge process data Hi, as there are more often similar questions rising, we just put an article about the topic on the Spectrum Scale Wiki https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Downsampling%2C%20Upsampling%20and%20Aggregation%20of%20the%20performance%20data While there will be some minor updates on the article in the next time, it might already explain your questions. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. [Inactive hide details for "Dorigo Alvise (PSI)" ---13.07.2018 12:08:59---Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 s]"Dorigo Alvise (PSI)" ---13.07.2018 12:08:59---Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes. From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 13.07.2018 12:08 Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes. I've the following perfmon configuration for the metric-group GPFSNSDDisk: { name = "GPFSNSDDisk" period = 2 restrict = "nsdNodes" }, that, as far as I know sends data to the collector every 2 seconds (correct ?). But how ? does it send what it reads from the counter every two seconds ? or does it aggregated in some way ? or what else ? In the collector node pmcollector, grafana-bridge and grafana-server run. Now I need to understand how to play with the grafana parameters: - Down sample (or Disable downsampling) - Aggregator (following on the same row the metrics). See attached picture 4s.png as reference. In the past I had the period set to 1. And grafana used to display correct data (bytes/s for the metric gpfs_nsdds_bytes_written) with aggregator set to "sum", which AFAIK means "sum all that metrics that match the filter below" (again see the attached picture to see how the filter is set to only collect data from the IO nodes). Today I've changed to "period=2"... and grafana started to display funny data rate (the double, or quad of the real rate). I had to play (almost randomly) with "Aggregator" (from sum to avg, which as fas as I undestand doesn't mean anything in my case... average between the two IO nodes ? or what ?) and "Down sample" (from empty to 2s, and then to 4s) to get back real data rate which is compliant with what I do get with dstat. Can someone kindly explain how to play with these parameters when zimon sensor's period is changed ? Many thanks in advance Regards, Alvise Dorigo[attachment "4s.png" deleted by Manfred Haubrich/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From makaplan at us.ibm.com Tue Aug 21 16:42:37 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 21 Aug 2018 11:42:37 -0400 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P; using QOS features In-Reply-To: References: Message-ID: (Aside from QOS, I second the notion to review your "failure groups" if you are using and depending on data replication.) For QOS, some suggestions: You might want to define a set of nodes that will do restripes using `mmcrnodeclass restripers -N ...` You can initially just enable `mmchqos FS --enable` and then monitor performance of your restripefs command `mmrestripefs FS -b -N restripers` that restricts operations to the restripers nodeclass. with `mmlsqos FS --seconds 60 [[see other options]]` Suppose you see an average iops rates of several thousand IOPs and you decide that is interfering with other work... Then, for example, you could "slow down" or "pace" mmrestripefs to use 999 iops within the system pool and 1999 iops within the data pool with: mmchqos FS --enable -N restripers pool=system,maintenance=999iops pool=data,maintenance=1999iops And monitor that with mmlsqos. Tip: For a more graphical view of QOS and disk performance, try samples/charts/qosplotfine.pl. You will need to have gnuplot working... If you are "into" performance tools you might want to look at the --fine-stats options of mmchqos and mmlsqos and plug that into your favorite performance viewer/plotter/analyzer tool(s). (Technical: mmlsqos --fine-stats is written to be used and digested by scripts, no so much for human "eyeballing". The --fine-stats argument of mmchqos is a number of seconds. The --fine-stats argument of mmlsqos is one or two index values. The doc for mmlsqos explains this and the qosplotfine.pl script is an example of how to use it. ) From: "Luis Bolinches" To: "gpfsug main discussion list" Date: 08/21/2018 12:56 AM Subject: Re: [gpfsug-discuss] Rebalancing with mmrestripefs -P Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi You can enable QoS first to see the activity while on inf value to see the current values of usage and set the li is later on. Those limits are modificable online so even in case you have (not your case it seems) less activity times those can be increased for replication then and Lowe again on peak times. ? SENT FROM MOBILE DEVICE Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous > On 21 Aug 2018, at 1.21, david_johnson at brown.edu wrote: > > Yes the arrays are in different buildings. We want to spread the activity over more servers if possible but recognize the extra load that rebalancing would entail. The system is busy all the time. > > I have considered using QOS when we run policy migrations but haven?t yet because I don?t know what value to allow for throttling IOPS. We need to do weekly migrations off of 15k rpm pool onto 7.2k rpm pool, and previously I?ve just let it run at native speed. I?d like to know what other folks have used for QOS settings. > > I think we may leave things alone for now regarding the original question, rebalancing this pool. > > -- ddj > Dave Johnson > >> On Aug 20, 2018, at 6:08 PM, valdis.kletnieks at vt.edu wrote: >> >> On Mon, 20 Aug 2018 14:02:05 -0400, "Frederick Stock" said: >> >>> Note you have two additional NSDs in the 33 failure group than you do in >>> the 23 failure group. You may want to change one of those NSDs in failure >>> group 33 to be in failure group 23 so you have equal storage space in both >>> failure groups. >> >> Keep in mind that the failure groups should be built up based on single points of failure. >> In other words, a failure group should consist of disks that will all stay up or all go down on >> the same failure (controller, network, whatever). >> >> Looking at the fact that you have 6 disks named 'dNN_george_33' and 8 named 'dNN_cit_33', >> it sounds very likely that they are in two different storage arrays, and you should make your >> failure groups so they don't span a storage array. In other words, taking a 'cit' disk >> and moving it into a 'george' failure group will Do The Wrong Thing, because if you do >> data replication, one copy can go onto a 'george' disk, and the other onto a 'cit' disk >> that's in the same array as the 'george' disk. If 'george' fails, you lose access to both >> replicas. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Richard.E.Powell at boeing.com Tue Aug 21 19:23:50 2018 From: Richard.E.Powell at boeing.com (Powell (US), Richard E) Date: Tue, 21 Aug 2018 18:23:50 +0000 Subject: [gpfsug-discuss] Problem using group pool for migration Message-ID: <7a0a914601594ccdb6c96504322de9c8@XCH15-09-11.nw.nos.boeing.com> Hi all, I'm trying to use the "GROUP POOL" feature for file migration with FILE_HEAT, similar to one of the ilm sample scripts. The problem I'm having is that it seems to be identifying the candidates correctly but, anytime I use the "group pool" name for the "to pool", it only selects the first candidate for migration. If I specify a single pool name for the "to pool", it selects multiple files as expected. Here are the policy rules I'm using: RULE 'gp' GROUP POOL 'gpool' is 'ssd' then 'disk1' RULE 'repack' MIGRATE FROM POOL 'gpool' TO POOL 'gpool' WEIGHT(FILE_HEAT) I'm not sure if I'm misunderstanding something or if this is a real bug. I'm just wondering if anyone else has run into this issue? I'm running 4.2.3.8 on RHEL 6. Thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Aug 21 20:45:10 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 21 Aug 2018 15:45:10 -0400 Subject: [gpfsug-discuss] Problem using group pool for migration In-Reply-To: <7a0a914601594ccdb6c96504322de9c8@XCH15-09-11.nw.nos.boeing.com> References: <7a0a914601594ccdb6c96504322de9c8@XCH15-09-11.nw.nos.boeing.com> Message-ID: Migrate to a group pool "repacks" the selected files over the pools that comprise the group IN THE ORDER SPECIFIED UP TO THE SPECIFIED LIMIT for each pool. To see this work, in your case, set a limit that is near the current occupancy of pool 'ssd'. For example: RULE ?gp? GROUP POOL ?gpool? is ?ssd? LIMIT(50) then ?disk1? Notice the documentation says the LIMIT defaults to 99. Also, if you've run the same policy before and nothings changed much, then of course, there's not going to be much "repacking" to be done, maybe not any. If the behaviour still doesn't make sense to you, try testing on a tiny file system with just a few small pools, sizing pools and files so that only a few files will fit in a pool... If you build such a test scenario and that still doesn't make sense, show us the example... ----------------------------------- From: "Powell (US), Richard E" Hi all, I?m trying to use the ?GROUP POOL? feature for file migration with FILE_HEAT, similar to one of the ilm sample scripts. The problem I?m having is that it seems to be identifying the candidates correctly but, anytime I use the ?group pool? name for the ?to pool?, it only selects the first candidate for migration. If I specify a single pool name for the ?to pool?, it selects multiple files as expected. Here are the policy rules I?m using: RULE ?gp? GROUP POOL ?gpool? is ?ssd? then ?disk1? RULE ?repack? MIGRATE FROM POOL ?gpool? TO POOL ?gpool? WEIGHT(FILE_HEAT) I?m not sure if I?m misunderstanding something or if this is a real bug. I?m just wondering if anyone else has run into this issue? I?m running 4.2.3.8 on RHEL 6. Thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Aug 21 21:11:10 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 21 Aug 2018 16:11:10 -0400 Subject: [gpfsug-discuss] Problem using group pool for migration In-Reply-To: References: <7a0a914601594ccdb6c96504322de9c8@XCH15-09-11.nw.nos.boeing.com> Message-ID: To repack in random order, which might be an interesting and easy way to test and demonstrate... Use the RAND() function: RULE ... MIGRATE ... WEIGHT(RAND()) ... -L 3 on the mmapplypolicy command will make the random weights evident in the output. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Aug 22 18:12:24 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 22 Aug 2018 17:12:24 +0000 Subject: [gpfsug-discuss] Those users.... Message-ID: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Sometimes, I look at the data that's being stored in my file systems and just shake my head: /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Wed Aug 22 19:17:01 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Wed, 22 Aug 2018 14:17:01 -0400 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: <422107E9-0AD1-49F8-99FD-D6713F90A844@ulmer.org> Clearly, those are the ones they?re working on. You?re lucky they?re de-duped. -- Stephen > On Aug 22, 2018, at 1:12 PM, Oesterlin, Robert wrote: > > Sometimes, I look at the data that's being stored in my file systems and just shake my head: > > /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > 507-269-0413 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From linesr at janelia.hhmi.org Wed Aug 22 19:54:22 2018 From: linesr at janelia.hhmi.org (Lines, Robert) Date: Wed, 22 Aug 2018 18:54:22 +0000 Subject: [gpfsug-discuss] Those users.... Message-ID: Make a better storage system and they will find a better way to abuse it. A PI during an annual talk to the facility: Because databases are hard and file systems have done a far better job of scaling we have implemented our datastore using files, file name and directory names. It handles the high concurrency far better than any database server we could have built for the amount we are charged for that same very tiny amount of data. Ignoring that the internal pricing for storage is based on sane usage and not packing your entire data set into small enough files that it all lives in the SSD tier. So I feel for you. Rob From: on behalf of "Oesterlin, Robert" Reply-To: gpfsug main discussion list Date: Wednesday, August 22, 2018 at 1:12 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] Those users.... Sometimes, I look at the data that's being stored in my file systems and just shake my head: /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) Bob Oesterlin Sr Principal Storage Engineer, Nuance 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bipcuds at gmail.com Wed Aug 22 20:32:56 2018 From: bipcuds at gmail.com (Keith Ball) Date: Wed, 22 Aug 2018 15:32:56 -0400 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Message-ID: Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. Many Thanks, Keith -------------- next part -------------- An HTML attachment was scrubbed... URL: From Richard.E.Powell at boeing.com Wed Aug 22 21:17:44 2018 From: Richard.E.Powell at boeing.com (Powell (US), Richard E) Date: Wed, 22 Aug 2018 20:17:44 +0000 Subject: [gpfsug-discuss] Problem using group pool for migration Message-ID: Allow me to elaborate on my question. The example I gave was trimmed-down to the minimum. I've been trying various combinations with different LIMIT values and different weight and where clauses, using '-I test' and '-I prepare' to see what it would do, but not actually doing the migration. The 'ssd' pool is about 36% utilized and I've been starting the mmapplypolicy scan at a sub-directory level where nearly all the files were in the disk pool. (You'll just have to trust me that the ssd pool can hold all of them :-)) If I specify 'ssd' as the "to pool", the output from the test or prepare options indicates that it would be able to migrate all of the candidate files to the ssd pool. But, if I specify the group pool as the "to pool", it is only willing to migrate the first candidate. That is with the ssd pool listed first in the group and with any limit as long as it's big enough to hold the current data plus the files I expected it to select, even the default of 99. I'm sure I'm either doing something wrong, or I *really* misunderstand the concept. It seems straight forward enough.... Thanks to everyone for your time! Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of gpfsug-discuss-request at spectrumscale.org Sent: Wednesday, August 22, 2018 4:00 AM To: gpfsug-discuss at spectrumscale.org Subject: gpfsug-discuss Digest, Vol 79, Issue 47 Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Problem using group pool for migration (Powell (US), Richard E) 2. Re: Problem using group pool for migration (Marc A Kaplan) 3. Re: Problem using group pool for migration (Marc A Kaplan) ---------------------------------------------------------------------- Message: 1 Date: Tue, 21 Aug 2018 18:23:50 +0000 From: "Powell (US), Richard E" To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Problem using group pool for migration Message-ID: <7a0a914601594ccdb6c96504322de9c8 at XCH15-09-11.nw.nos.boeing.com> Content-Type: text/plain; charset="us-ascii" Hi all, I'm trying to use the "GROUP POOL" feature for file migration with FILE_HEAT, similar to one of the ilm sample scripts. The problem I'm having is that it seems to be identifying the candidates correctly but, anytime I use the "group pool" name for the "to pool", it only selects the first candidate for migration. If I specify a single pool name for the "to pool", it selects multiple files as expected. Here are the policy rules I'm using: RULE 'gp' GROUP POOL 'gpool' is 'ssd' then 'disk1' RULE 'repack' MIGRATE FROM POOL 'gpool' TO POOL 'gpool' WEIGHT(FILE_HEAT) I'm not sure if I'm misunderstanding something or if this is a real bug. I'm just wondering if anyone else has run into this issue? I'm running 4.2.3.8 on RHEL 6. Thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 21 Aug 2018 15:45:10 -0400 From: "Marc A Kaplan" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem using group pool for migration Message-ID: Content-Type: text/plain; charset="utf-8" Migrate to a group pool "repacks" the selected files over the pools that comprise the group IN THE ORDER SPECIFIED UP TO THE SPECIFIED LIMIT for each pool. To see this work, in your case, set a limit that is near the current occupancy of pool 'ssd'. For example: RULE ?gp? GROUP POOL ?gpool? is ?ssd? LIMIT(50) then ?disk1? Notice the documentation says the LIMIT defaults to 99. Also, if you've run the same policy before and nothings changed much, then of course, there's not going to be much "repacking" to be done, maybe not any. If the behaviour still doesn't make sense to you, try testing on a tiny file system with just a few small pools, sizing pools and files so that only a few files will fit in a pool... If you build such a test scenario and that still doesn't make sense, show us the example... ----------------------------------- From: "Powell (US), Richard E" Hi all, I?m trying to use the ?GROUP POOL? feature for file migration with FILE_HEAT, similar to one of the ilm sample scripts. The problem I?m having is that it seems to be identifying the candidates correctly but, anytime I use the ?group pool? name for the ?to pool?, it only selects the first candidate for migration. If I specify a single pool name for the ?to pool?, it selects multiple files as expected. Here are the policy rules I?m using: RULE ?gp? GROUP POOL ?gpool? is ?ssd? then ?disk1? RULE ?repack? MIGRATE FROM POOL ?gpool? TO POOL ?gpool? WEIGHT(FILE_HEAT) I?m not sure if I?m misunderstanding something or if this is a real bug. I?m just wondering if anyone else has run into this issue? I?m running 4.2.3.8 on RHEL 6. Thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Tue, 21 Aug 2018 16:11:10 -0400 From: "Marc A Kaplan" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem using group pool for migration Message-ID: Content-Type: text/plain; charset="us-ascii" To repack in random order, which might be an interesting and easy way to test and demonstrate... Use the RAND() function: RULE ... MIGRATE ... WEIGHT(RAND()) ... -L 3 on the mmapplypolicy command will make the random weights evident in the output. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 79, Issue 47 ********************************************** From valdis.kletnieks at vt.edu Wed Aug 22 21:35:57 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 22 Aug 2018 16:35:57 -0400 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: <168045.1534970157@turing-police.cc.vt.edu> On Wed, 22 Aug 2018 17:12:24 -0000, "Oesterlin, Robert" said: > Sometimes, I look at the data that's being stored in my file systems and just shake my head: > > /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) I've got 114,029 files of the form: /gpfs/archive/tenant/this/that/F:\the\other\thing\what\where\they\thinking/apparently/not/much.dat I admit being mystified - how does such a mess happen? (Note that our tenant users are only able to access the GPFS filesystem through NFS - which is only exported to other Linux systems....) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From jfosburg at mdanderson.org Wed Aug 22 21:44:29 2018 From: jfosburg at mdanderson.org (Fosburgh,Jonathan) Date: Wed, 22 Aug 2018 20:44:29 +0000 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <168045.1534970157@turing-police.cc.vt.edu> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> <168045.1534970157@turing-police.cc.vt.edu> Message-ID: <42A96B62-CD95-458B-A702-F6ECFAC66AEF@mdanderson.org> A very, very long time ago we had an AIX system (4.3 with jfs1) where the users logged in interactively. We would find files with names like: /C:\some\very \non-posix\path/file There's a reason they're called lusers. ?On 8/22/18, 3:36 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of valdis.kletnieks at vt.edu" wrote: On Wed, 22 Aug 2018 17:12:24 -0000, "Oesterlin, Robert" said: > Sometimes, I look at the data that's being stored in my file systems and just shake my head: > > /gpfs//Restricted/EventChangeLogs/deduped/working contains 17,967,350 files (in ONE directory) I've got 114,029 files of the form: /gpfs/archive/tenant/this/that/F:\the\other\thing\what\where\they\thinking/apparently/not/much.dat I admit being mystified - how does such a mess happen? (Note that our tenant users are only able to access the GPFS filesystem through NFS - which is only exported to other Linux systems....) The information contained in this e-mail message may be privileged, confidential, and/or protected from disclosure. This e-mail message may contain protected health information (PHI); dissemination of PHI should comply with applicable federal and state laws. If you are not the intended recipient, or an authorized representative of the intended recipient, any further review, disclosure, use, dissemination, distribution, or copying of this message or any attachment (or the information contained therein) is strictly prohibited. If you think that you have received this e-mail message in error, please notify the sender by return e-mail and delete all references to it and its contents from your systems. From jonathan.buzzard at strath.ac.uk Wed Aug 22 23:37:55 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 22 Aug 2018 23:37:55 +0100 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: On 22/08/18 18:12, Oesterlin, Robert wrote: > Sometimes, I look at the data that's being stored in my file systems and > just shake my head: > > /gpfs//Restricted/EventChangeLogs/deduped/working contains > 17,967,350 files (in ONE directory) > That's what inode quota's are for. Set it pretty high to begin with, say one million. That way the vast majority of users have no issues ever. Then the troublesome few will have issues at which point you can determine why they are storing so many files, and appropriately educate them on better ways to do it. Finally if they really need that many files just charge them for it :-) Having lots of files has a cost just like having lots of data has a cost, and it's not fair for the reasonable users to subsidize them. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From abeattie at au1.ibm.com Thu Aug 23 00:02:28 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 22 Aug 2018 23:02:28 +0000 Subject: [gpfsug-discuss] Those users.... In-Reply-To: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Thu Aug 23 01:59:11 2018 From: skylar2 at uw.edu (Skylar Thompson) Date: Wed, 22 Aug 2018 17:59:11 -0700 Subject: [gpfsug-discuss] Those users.... In-Reply-To: References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: <20180823005911.GA5982@almaren> On Wed, Aug 22, 2018 at 11:37:55PM +0100, Jonathan Buzzard wrote: > On 22/08/18 18:12, Oesterlin, Robert wrote: > >Sometimes, I look at the data that's being stored in my file systems and > >just shake my head: > > > >/gpfs//Restricted/EventChangeLogs/deduped/working contains > >17,967,350 files (in ONE directory) > > > > That's what inode quota's are for. Set it pretty high to begin with, say one > million. That way the vast majority of users have no issues ever. Then the > troublesome few will have issues at which point you can determine why they > are storing so many files, and appropriately educate them on better ways to > do it. Finally if they really need that many files just charge them for it > :-) Having lots of files has a cost just like having lots of data has a > cost, and it's not fair for the reasonable users to subsidize them. Yep, we set our fileset inode quota to 1 million/TB of allocated space. It seems overly generous to me but it's far better than no limit at all. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From rohwedder at de.ibm.com Thu Aug 23 09:51:39 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Thu, 23 Aug 2018 10:51:39 +0200 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI In-Reply-To: References: Message-ID: Hello Keith, it is not so easy. The GUI receives events from other scale components using the currently defined ports. Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES). Therefore at this point there is no procedure to change this behaviour across all components. Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080. Tweaking the ports in the server.xml file will only change the native ports that the GUI uses. The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. If these ports are already used by another service, the GUI will not start up. Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap. If you want to emphasize your case as future development item, please let me know. I would also be interested in: > Scale version you are running > Do you need port 80 or 443 as well? > Would it work for you if the xCAT service was bound to a single IP address? Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: Keith Ball To: gpfsug-discuss at spectrumscale.org Date: 22.08.2018 21:33 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: ??? ??????? ??? but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. Many Thanks, ? Keith _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 15917110.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Juri.Haberland at rohde-schwarz.com Thu Aug 23 10:24:38 2018 From: Juri.Haberland at rohde-schwarz.com (Juri Haberland) Date: Thu, 23 Aug 2018 09:24:38 +0000 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Message-ID: Hello Markus, I?m not sure how to interpret your answer: Do the internal processes connect to the non-privileged ports (47443 and 47080) or the privileged ports? If they use the privileged ports we would appreciate it if IBM could change that behavior to using the non-privileged ports so one could change the privileged ones or use a httpd server in front of the GUI web service. We are going to need this in the near future as well? Thanks & kind regards. Juri Haberland -- Juri Haberland R&D SW File Based Media Solutions | 7TF1 Rohde & Schwarz GmbH & Co. KG Hanomaghof 1 | 30449 Hannover Phone: +49 511 678 07 246 | Fax: +49 511 678 07 200 Internet: www.rohde-schwarz.com Gesch?ftsf?hrung / Executive Board: Christian Leicher (Vorsitzender / Chairman), Peter Riedel, Sitz der Gesellschaft / Company's Place of Business: M?nchen, Registereintrag / Commercial Register No.: HRA 16 270, Pers?nlich haftender Gesellschafter / Personally Liable Partner: RUSEG Verwaltungs-GmbH, Sitz der Gesellschaft / Company's Place of Business: M?nchen, Registereintrag / Commercial Register No.: HRB 7 534, Umsatzsteuer-Identifikationsnummer (USt-IdNr.) / VAT Identification No.: DE 130 256 683, Elektro-Altger?te Register (EAR) / WEEE Register No.: DE 240 437 86 From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Markus Rohwedder Sent: Thursday, August 23, 2018 10:52 AM To: gpfsug main discussion list Subject: *EXT* [Newsletter] Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Hello Keith, it is not so easy. The GUI receives events from other scale components using the currently defined ports. Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES). Therefore at this point there is no procedure to change this behaviour across all components. Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080. Tweaking the ports in the server.xml file will only change the native ports that the GUI uses. The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. If these ports are already used by another service, the GUI will not start up. Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap. If you want to emphasize your case as future development item, please let me know. I would also be interested in: > Scale version you are running > Do you need port 80 or 443 as well? > Would it work for you if the xCAT service was bound to a single IP address? Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development ________________________________ Phone: +49 7034 6430190 IBM Deutschland Research & Development [cid:image003.png at 01D43AD3.9FE459C0] E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ [Inactive hide details for Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the]Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? From: Keith Ball > To: gpfsug-discuss at spectrumscale.org Date: 22.08.2018 21:33 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. Many Thanks, Keith _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 166 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 4659 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.gif Type: image/gif Size: 105 bytes Desc: image004.gif URL: From daniel.kidger at uk.ibm.com Thu Aug 23 11:13:04 2018 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Thu, 23 Aug 2018 10:13:04 +0000 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=8FBB0861DFBF7B798f9e8a93df938690918c8FB at .gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.2__=8FBB0861DFBF7B798f9e8a93df938690918c8FB at .gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.3__=8FBB0861DFBF7B798f9e8a93df938690918c8FB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From rohwedder at de.ibm.com Thu Aug 23 12:50:32 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Thu, 23 Aug 2018 13:50:32 +0200 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI In-Reply-To: References: , Message-ID: Hello Juri, Keith, thank you for your responses. The internal services communicate on the privileged ports, for backwards compatibility and firewall simplicity reasons. We can not just assume all nodes in the cluster are at the latest level. Running two services at the same port on different IP addresses could be an option to consider for co-existance of the GUI and another service on the same node. However we have not set up, tested nor documented such a configuration as of today. Currently the GUI service manages the iptables redirect bring up and tear down. If this would be managed externally it would be possible to bind services to specific ports based on specific IPs. In order to create custom redirect rules based on IP address it is necessary to instruct the GUI to - not check for already used ports when the GUI service tries to start up - don't create/destroy port forwarding rules during GUI service start and stop. This GUI behavior can be configured using the internal flag UPDATE_IPTABLES in the service configuration with the 5.0.1.2 GUI code level. The service configuration is not stored in the cluster configuration and may be overwritten during code upgrades, so these settings may have to be added again after an upgrade. See this KC link: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1adv_firewallforgui.htm Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: "Daniel Kidger" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 23.08.2018 12:13 Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org Keith, I have another IBM customer who also wished to move Scale GUI's https ports. In their case because they had their own web based management interface on the same https port. Is this the same reason that you have? If so I wonder how many other sites have the same issue? One workaround that was suggested at the time, was to add a second IP address to the node (piggy-backing on 'eth0'). Then run the two different GUIs, one per IP address. Is this an option, albeit a little ugly? Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-(0)7818 522 266 daniel.kidger at uk.ibm.com ----- Original message ----- From: "Markus Rohwedder" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Date: Thu, Aug 23, 2018 9:51 AM Hello Keith, it is not so easy. The GUI receives events from other scale components using the currently defined ports. Changing the GUI ports will cause breakage in the GUI stack at several places (internal watchdog functions, interlock with health events, interlock with CES). Therefore at this point there is no procedure to change this behaviour across all components. Because the GUI service does not run as root. the GUI server does not serve the privileged ports 80 and 443 directly but rather 47443 and 47080. Tweaking the ports in the server.xml file will only change the native ports that the GUI uses. The GUI manages IPTABLES rules to forward ports 443 and 80 to 47443 and 47080. If these ports are already used by another service, the GUI will not start up. Making the GUI ports freely configurable is therefore not a strightforward change, and currently no on our roadmap. If you want to emphasize your case as future development item, please let me know. I would also be interested in: > Scale version you are running > Do you need port 80 or 443 as well? > Would it work for you if the xCAT service was bound to a single IP address? Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany Inactive hide details for Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Keith Ball ---22.08.2018 21:33:25---Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? From: Keith Ball To: gpfsug-discuss at spectrumscale.org Date: 22.08.2018 21:33 Subject: [gpfsug-discuss] Changing Web ports for the Spectrum Scale GUI Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, Does anyone know how to change the HTTP ports for the Spectrum Scale GUI? Any documentation or RedPaper I have found deftly avoids discussing this. The most promising thing I see is in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml: but it appears that port 80 specifically is used also by the GUI's Web service. I already have an HTTP server using port 80 for provisioning (xCAT), so would rather change the Specturm Scale GUI configuration if I can. Many Thanks, Keith _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 17153317.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 17310450.gif Type: image/gif Size: 60281 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Thu Aug 23 14:27:41 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 23 Aug 2018 13:27:41 +0000 Subject: [gpfsug-discuss] Call home Message-ID: <696B8436-17A4-4EEC-933E-7B1B0B13D498@bham.ac.uk> Hi, I?m just having a poke around with the callhome feature. If I use `mmcallhome group auto`, I can see that it creates a group. Now if I add a node to the cluster, how to I add that node to the same call home group that is already present? If I try for example: $ mmcallhome group add autoGroup_1 MYNEWSERVER --node all Failed to add this group: Group name "autoGroup_1" is already used ? Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Thu Aug 23 14:57:43 2018 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Thu, 23 Aug 2018 13:57:43 +0000 Subject: [gpfsug-discuss] Call home In-Reply-To: <696B8436-17A4-4EEC-933E-7B1B0B13D498@bham.ac.uk> Message-ID: Hi Simon, Just recreate the group using mmcallhome group auto command together with ?force option to overwrite the existing group. Sent from my iPhone using IBM Verse On 23. Aug 2018, 15:27:51, S.J.Thompson at bham.ac.uk wrote: From: S.J.Thompson at bham.ac.uk To: gpfsug-discuss at spectrumscale.org Cc: Date: 23. Aug 2018, 15:27:51 Subject: [gpfsug-discuss] Call home Hi, I?m just having a poke around with the callhome feature. If I use `mmcallhome group auto`, I can see that it creates a group. Now if I add a node to the cluster, how to I add that node to the same call home group that is already present? If I try for example: $ mmcallhome group add autoGroup_1 MYNEWSERVER --node all Failed to add this group: Group name "autoGroup_1" is already used ? Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Aug 23 15:25:00 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 23 Aug 2018 10:25:00 -0400 Subject: [gpfsug-discuss] Problem using group pool for migration In-Reply-To: References: Message-ID: Richard Powell, Good that you have it down to a smallish test case. Let's see it! Here's my test case. Notice I use -L 2 and -I test to see what's what: [root@/main/gpfs-git]$mmapplypolicy c41 -P /gh/c41gp.policy -L 2 -I test [I] GPFS Current Data Pool Utilization in KB and % Pool_Name KB_Occupied KB_Total Percent_Occupied cool 66048 9436160 0.699945741% system 1190656 8388608 14.193725586% xtra 66048 8388608 0.787353516% [I] 4045 of 65792 inodes used: 6.148164%. [I] Loaded policy rules from /gh/c41gp.policy. Evaluating policy rules with CURRENT_TIMESTAMP = 2018-08-23 at 14:11:26 UTC Parsed 2 policy rules. rule 'gp' group pool 'gp' is 'system' limit(3) then 'cool' limit(4) then 'xtra' rule 'mig' migrate from pool 'gp' to pool 'gp' weight(rand()) [I] 2018-08-23 at 14:11:26.367 Directory entries scanned: 8. [I] Directories scan: 7 files, 1 directories, 0 other objects, 0 'skipped' files and/or errors. [I] 2018-08-23 at 14:11:26.371 Sorting 8 file list records. [I] 2018-08-23 at 14:11:26.416 Policy evaluation. 8 files scanned. [I] 2018-08-23 at 14:11:26.421 Sorting 7 candidate file list records. WEIGHT(0.911647) MIGRATE /c41/100e TO POOL gp/cool SHOW() WEIGHT(0.840188) MIGRATE /c41/100a TO POOL gp/cool SHOW() WEIGHT(0.798440) MIGRATE /c41/100d TO POOL gp/cool SHOW() WEIGHT(0.783099) MIGRATE /c41/100c TO POOL gp/xtra SHOW() WEIGHT(0.394383) MIGRATE /c41/100b TO POOL gp/xtra SHOW() WEIGHT(0.335223) MIGRATE /c41/100g TO POOL gp/xtra SHOW() WEIGHT(0.197551) MIGRATE /c41/100f TO POOL gp/xtra SHOW() [I] 2018-08-23 at 14:11:26.430 Choosing candidate files. 7 records scanned. [I] Summary of Rule Applicability and File Choices: Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule 0 7 716800 7 716800 0 RULE 'mig' MIGRATE FROM POOL 'gp' WEIGHT(.) \ TO POOL 'gp' [I] Filesystem objects with no applicable rules: 1. [I] GPFS Policy Decisions and File Choice Totals: Chose to migrate 716800KB: 7 of 7 candidates; [I] File Migrations within Group Pools Group Pool Files_Out KB_Out Files_In KB_In gp system 7 716800 0 0 gp cool 0 0 3 307200 gp xtra 0 0 4 409600 Predicted Data Pool Utilization in KB and %: Pool_Name KB_Occupied KB_Total Percent_Occupied cool 373248 9436160 3.955507325% system 473856 8388608 5.648803711% xtra 475648 8388608 5.670166016% -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Aug 23 16:23:33 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 23 Aug 2018 11:23:33 -0400 Subject: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake In-Reply-To: References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: Millions of files per directory, may well be a mistake... BUT there are some very smart use cases that might take advantage of GPFS having good performance with large directories -- because GPFS uses extensible hashing -- it is better to store millions of files in a single GPFS directory than artificially scatter them among directories based on the mistaken notion that large directories are bad. (Yeah, they are in most implementations, but not in GPFS.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Thu Aug 23 16:32:24 2018 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Thu, 23 Aug 2018 11:32:24 -0400 Subject: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake In-Reply-To: References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> Message-ID: <950BFBAE-145B-43DB-AE07-B3A17DC1795A@brown.edu> But heaven help you if you export the gpfs on nfs or cifs. -- ddj Dave Johnson > On Aug 23, 2018, at 11:23 AM, Marc A Kaplan wrote: > > Millions of files per directory, may well be a mistake... > > BUT there are some very smart use cases that might take advantage of GPFS having good performance with large directories -- > because GPFS uses extensible hashing -- it is better to store millions of files in a single GPFS directory than artificially scatter them among directories based on the mistaken notion that large directories are bad. (Yeah, they are in most implementations, but not in GPFS.) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Aug 23 18:01:27 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 23 Aug 2018 13:01:27 -0400 Subject: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake In-Reply-To: <950BFBAE-145B-43DB-AE07-B3A17DC1795A@brown.edu> References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> <950BFBAE-145B-43DB-AE07-B3A17DC1795A@brown.edu> Message-ID: Even with nfs or samba export you're probably okay as long as the application does not attempt to list the directory. Just probe it with stat/open/create/unlink. From: david_johnson at brown.edu To: gpfsug main discussion list Date: 08/23/2018 11:34 AM Subject: Re: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake Sent by: gpfsug-discuss-bounces at spectrumscale.org But heaven help you if you export the gpfs on nfs or cifs. -- ddj Dave Johnson On Aug 23, 2018, at 11:23 AM, Marc A Kaplan wrote: Millions of files per directory, may well be a mistake... BUT there are some very smart use cases that might take advantage of GPFS having good performance with large directories -- because GPFS uses extensible hashing -- it is better to store millions of files in a single GPFS directory than artificially scatter them among directories based on the mistaken notion that large directories are bad. (Yeah, they are in most implementations, but not in GPFS.) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Aug 23 19:30:30 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 23 Aug 2018 18:30:30 +0000 Subject: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake In-Reply-To: References: <5CFDC4D5-CD5C-4293-83F4-FCA2C35B055F@nuance.com> <950BFBAE-145B-43DB-AE07-B3A17DC1795A@brown.edu> Message-ID: Thankfully all application developers completely understand why listing directories are a bad idea... ;o) Or at least they will learn the hard way otherwise, -B From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Marc A Kaplan Sent: Thursday, August 23, 2018 12:01 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake Note: External Email ________________________________ Even with nfs or samba export you're probably okay as long as the application does not attempt to list the directory. Just probe it with stat/open/create/unlink. From: david_johnson at brown.edu To: gpfsug main discussion list > Date: 08/23/2018 11:34 AM Subject: Re: [gpfsug-discuss] Those users.... millions of files per directory - not necessarily a mistake Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ But heaven help you if you export the gpfs on nfs or cifs. -- ddj Dave Johnson On Aug 23, 2018, at 11:23 AM, Marc A Kaplan > wrote: Millions of files per directory, may well be a mistake... BUT there are some very smart use cases that might take advantage of GPFS having good performance with large directories -- because GPFS uses extensible hashing -- it is better to store millions of files in a single GPFS directory than artificially scatter them among directories based on the mistaken notion that large directories are bad. (Yeah, they are in most implementations, but not in GPFS.) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company's treatment of personal data, please email datarequests at jumptrading.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Aug 23 19:37:21 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 23 Aug 2018 18:37:21 +0000 Subject: [gpfsug-discuss] Are you attending IBM TechU in Hollywood, FL in October? Message-ID: <754D53F3-70C8-4481-9219-1665214C9302@nuance.com> Hi, if you are attending the IBM TechU in October, and are interested in giving a sort client perspective on Spectrum Scale, I?d like to hear from you. On October 15th, there will be a small ?mini-UG? session at this TechU and we?d like to include a client presentation. The rough outline is below, and as you can see it?s ?short and sweet?. Please drop me a note if you?d like to present. 10 mins ? Welcome & Introductions 45 mins ? Spectrum Scale/ESS Latest Enhancements and IBM Coral Project 30 mins - Spectrum Scale Use Cases 20 mins ? Spectrum Scale Client presentation 20 mins ? Spectrum Scale Roadmap 15 mins ? Questions & Close Close ? Drinks & Networking Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Sat Aug 25 01:12:08 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 24 Aug 2018 17:12:08 -0700 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 In-Reply-To: <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov> References: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov> Message-ID: <1D31EBD3-CCC9-423B-83E9-3919C9A3DA1D@lbl.gov> You may consider this an official nag-o-gram that the registration deadline is approaching. September 1st?don?t forget! > On Aug 13, 2018, at 5:09 PM, Kristy Kallback-Rose wrote: > > All, don?t forget registration ends on the early side for this event due to background checks, etc. > > As noted below: > > IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. > > Hope you?ll be able to attend! > > Best, > Kristy > >> On Aug 3, 2018, at 12:37 PM, Kristy Kallback-Rose > wrote: >> >> All, >> >> Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: >> ? the draft agenda (bottom of page), >> ? a link to registration, register by September 1 due to ORNL site requirements (see next line) >> ? an important note about registration requirements for going to Oak Ridge National Lab >> ? a request for your site presentations >> ? information about HPCXXL and who to contact for information about joining, and >> ? other upcoming events. >> >> Hope you can attend and see Summit and Alpine first hand. >> >> Best, >> Kristy >> >> Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 >> >> IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. >> >> ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. >> >> About HPCXXL: >> HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. >> The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. >> To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. >> >> Other upcoming GPFS/SS events: >> Sep 19+20 HPCXXL, Oak Ridge >> Aug 10 Meetup along TechU, Sydney >> Oct 24 NYC User Meeting, New York >> Nov 11 SC, Dallas >> Dec 12 CIUK, Manchester >> >> >> Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ >> Duration Start End Title >> >> Wednesday 19th, 2018 >> >> Speaker >> >> TBD >> Chris Maestas (IBM) TBD (IBM) >> TBD (IBM) >> John Lewars (IBM) >> >> *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) >> John Lewars (IBM) >> >> Carl Zetie (IBM) TBD >> >> TBD (ORNL) >> TBD (IBM) >> William Godoy (ORNL) Ted Hoover (IBM) >> >> Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All >> >> 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 >> >> 13:15 Welcome >> 13:45 What is new in Spectrum Scale? >> 14:00 What is new in ESS? >> 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === >> 15:40 AWE >> 16:00 CSCS site report >> 16:20 Starfish (Sponsor talk) >> 16:50 Network Flow >> 17:20 RFEs >> 17:30 W rap-up >> >> Thursday 19th, 2018 >> >> 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 >> >> 08:50 Alpine ? the Summit file system >> 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library >> 10:00 AI Reference Architecture >> 10:30 === BREAK === >> 11:00 Encryption on the wire and on rest 11:30 Service Update >> 12:00 Open Forum >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ethan-Hereth at utc.edu Mon Aug 27 16:42:17 2018 From: Ethan-Hereth at utc.edu (Hereth, Ethan) Date: Mon, 27 Aug 2018 15:42:17 +0000 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 In-Reply-To: <1D31EBD3-CCC9-423B-83E9-3919C9A3DA1D@lbl.gov> References: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov>, <1D31EBD3-CCC9-423B-83E9-3919C9A3DA1D@lbl.gov> Message-ID: Good morning gpfsug!! TLDR: What day is included in the free GPFS/SS UGM? Can somebody please confirm for me the date(s) for the free GPFS/SS workshop/UGM? Firstly, it appears as if it's on both the 19th and 20th, secondly, the Eventbrite form says that I need to be very accurate so I want to be sure. I'm just 1.5 hours away, so I'm hoping to drive up for the UGM. Cheers! -- Ethan Alan Hereth, PhD High Performance Computing Specialist SimCenter: National Center for Computational Engineering 701 East M.L. King Boulevard Chattanooga, TN 37403 [work]:423.425.5431 [cell]:423.991.4971 ethan-hereth at utc.edu www.utc.edu/simcenter ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Kristy Kallback-Rose Sent: Friday, August 24, 2018 8:12:08 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 You may consider this an official nag-o-gram that the registration deadline is approaching. September 1st?don?t forget! On Aug 13, 2018, at 5:09 PM, Kristy Kallback-Rose > wrote: All, don?t forget registration ends on the early side for this event due to background checks, etc. As noted below: IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Hope you?ll be able to attend! Best, Kristy On Aug 3, 2018, at 12:37 PM, Kristy Kallback-Rose > wrote: All, Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: ? the draft agenda (bottom of page), ? a link to registration, register by September 1 due to ORNL site requirements (see next line) ? an important note about registration requirements for going to Oak Ridge National Lab ? a request for your site presentations ? information about HPCXXL and who to contact for information about joining, and ? other upcoming events. Hope you can attend and see Summit and Alpine first hand. Best, Kristy Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. About HPCXXL: HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. Other upcoming GPFS/SS events: Sep 19+20 HPCXXL, Oak Ridge Aug 10 Meetup along TechU, Sydney Oct 24 NYC User Meeting, New York Nov 11 SC, Dallas Dec 12 CIUK, Manchester Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ Duration Start End Title Wednesday 19th, 2018 Speaker TBD Chris Maestas (IBM) TBD (IBM) TBD (IBM) John Lewars (IBM) *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) John Lewars (IBM) Carl Zetie (IBM) TBD TBD (ORNL) TBD (IBM) William Godoy (ORNL) Ted Hoover (IBM) Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 13:15 Welcome 13:45 What is new in Spectrum Scale? 14:00 What is new in ESS? 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === 15:40 AWE 16:00 CSCS site report 16:20 Starfish (Sponsor talk) 16:50 Network Flow 17:20 RFEs 17:30 W rap-up Thursday 19th, 2018 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 08:50 Alpine ? the Summit file system 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library 10:00 AI Reference Architecture 10:30 === BREAK === 11:00 Encryption on the wire and on rest 11:30 Service Update 12:00 Open Forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Tue Aug 28 05:17:49 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 27 Aug 2018 21:17:49 -0700 Subject: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 In-Reply-To: References: <786CCEE4-6C37-46D4-8DE4-F9154AB150FE@lbl.gov> <4B5FBF0F-B59C-4485-BF08-E93FB66B97BD@lbl.gov> <1D31EBD3-CCC9-423B-83E9-3919C9A3DA1D@lbl.gov> Message-ID: <1802E998-2152-4FDE-9CE7-974203782317@lbl.gov> Two half-days are included. Wednesday 19th, 2018 starting 1p. Thursday 19th, 2018, starting 830 am. I believe there is a plan for a data center tour at the end of Thursday sessions "Summit Facility Tour? on the HPCXXL agenda. Let me know if there are other questions. -Kristy PS - Latest schedule is (PDF): > On Aug 27, 2018, at 8:42 AM, Hereth, Ethan wrote: > > Good morning gpfsug!! > > TLDR: What day is included in the free GPFS/SS UGM? > > Can somebody please confirm for me the date(s) for the free GPFS/SS workshop/UGM? Firstly, it appears as if it's on both the 19th and 20th, secondly, the Eventbrite form says that I need to be very accurate so I want to be sure. > > I'm just 1.5 hours away, so I'm hoping to drive up for the UGM. > > Cheers! > > -- > Ethan Alan Hereth, PhD > High Performance Computing Specialist > > SimCenter: National Center for Computational Engineering > 701 East M.L. King Boulevard > Chattanooga, TN 37403 > > [work]:423.425.5431 > [cell]:423.991.4971 > ethan-hereth at utc.edu > www.utc.edu/simcenter > From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Kristy Kallback-Rose > Sent: Friday, August 24, 2018 8:12:08 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS/SS UG Event at ORNL, Register by September 1 > > You may consider this an official nag-o-gram that the registration deadline is approaching. September 1st?don?t forget! > > >> On Aug 13, 2018, at 5:09 PM, Kristy Kallback-Rose > wrote: >> >> All, don?t forget registration ends on the early side for this event due to background checks, etc. >> >> As noted below: >> >> IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. >> >> Hope you?ll be able to attend! >> >> Best, >> Kristy >> >>> On Aug 3, 2018, at 12:37 PM, Kristy Kallback-Rose > wrote: >>> >>> All, >>> >>> Here are some updates for the Spectrum Scale/GPFS UG Event at ORNL as part of the HPCXXL meeting. Below you will find: >>> ? the draft agenda (bottom of page), >>> ? a link to registration, register by September 1 due to ORNL site requirements (see next line) >>> ? an important note about registration requirements for going to Oak Ridge National Lab >>> ? a request for your site presentations >>> ? information about HPCXXL and who to contact for information about joining, and >>> ? other upcoming events. >>> >>> Hope you can attend and see Summit and Alpine first hand. >>> >>> Best, >>> Kristy >>> >>> Registration link, you can register just for GPFS/SS day at $0: https://www.eventbrite.com/e/hpcxxl-2018-summer-meeting-registration-47111539884 >>> >>> IMPORTANT: September 1st is the deadline to register for HPCXXL and the GPFS Day. Registration closes earlier than normal. This is due to the background check required to attend the event on site at ORNL. The access review process takes at least 3 weeks to complete for foreign nationals and 1 week to complete for US Citizens. So don't wait too long to make your travel decisions. >>> >>> ALSO: If you are interested in giving a site presentation, please let us know as we are trying to finalize the agenda. >>> >>> About HPCXXL: >>> HPCXXL is a user group for sites which have large supercomputing and storage installations. Because of the history of HPCXXL, the focus of the group is on large-scale scientific/technical computing using IBM or Lenovo hardware and software, but other vendor hardware and software is also welcome. Some of the areas we cover are: Applications, Code Development Tools, Communications, Networking, Parallel I/O, Resource Management, System Administration, and Training. We address topics across a wide range of issues that are important to sustained petascale scientific/technical computing on scaleable parallel machines. Some of the benefits of joining the group include knowledge sharing across members, NDA content availability from vendors, and access to vendor developers and support staff. >>> The HPCXXL user group is a self-organized and self-supporting group. Members and affiliates are expected to participate actively in the HPCXXL meetings and activities and to cover their own costs for participating. HPCXXL meetings are open only to members and affiliates of the HPCXXL. HPCXXL member institutions must have an appropriate non-disclosure agreement in place with IBM and Lenovo, since at times both vendors disclose and discuss information of a confidential nature with the group. >>> To join HPCXXL, a new organization needs to be sponsored by a current HPCXXL member or by the prospective member themselves. This process is straightforward and can be completed over email or in person when a representative attends their first meeting. If you are interested in learning more, please contact m.stephan at fz-juelich.de HPCXXL president Michael Stephan. >>> >>> Other upcoming GPFS/SS events: >>> Sep 19+20 HPCXXL, Oak Ridge >>> Aug 10 Meetup along TechU, Sydney >>> Oct 24 NYC User Meeting, New York >>> Nov 11 SC, Dallas >>> Dec 12 CIUK, Manchester >>> >>> >>> Draft agenda below, full HPCXXL meeting information here: http://hpcxxl.org/meetings/summer-2018-meeting/ >>> Duration Start End Title >>> Wednesday 19th, 2018 >>> Speaker >>> TBD >>> Chris Maestas (IBM) TBD (IBM) >>> TBD (IBM) >>> John Lewars (IBM) >>> *** TO BE CONFIRMED *** *** TO BE CONFIRMED *** TBD (Starfish) >>> John Lewars (IBM) >>> Carl Zetie (IBM) TBD >>> TBD (ORNL) >>> TBD (IBM) >>> William Godoy (ORNL) Ted Hoover (IBM) >>> Sandeep Ramesh (IBM) *** TO BE CONFIRMED *** All >>> 15 13:00 30 13:15 15 13:45 25 14:00 25 14:25 30 14:50 20 15:20 20 15:40 20 16:00 30 16:20 30 16:50 10 17:20 >>> 13:15 Welcome >>> 13:45 What is new in Spectrum Scale? >>> 14:00 What is new in ESS? >>> 14:25 Spinning up a Hadoop cluster on demand 14:50 Running Container on a Super Computer 15:20 === BREAK === >>> 15:40 AWE >>> 16:00 CSCS site report >>> 16:20 Starfish (Sponsor talk) >>> 16:50 Network Flow >>> 17:20 RFEs >>> 17:30 W rap-up >>> Thursday 19th, 2018 >>> 20 08:30 30 08:50 20 09:20 20 09:40 30 10:00 30 10:30 30 11:00 30 11:30 >>> 08:50 Alpine ? the Summit file system >>> 09:20 Performance enhancements for CORAL 09:40 ADIOS I/O library >>> 10:00 AI Reference Architecture >>> 10:30 === BREAK === >>> 11:00 Encryption on the wire and on rest 11:30 Service Update >>> 12:00 Open Forum >>> >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SSUG18HPCXXL - Agenda - 2018-08-20.pdf Type: application/pdf Size: 109797 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Tue Aug 28 05:51:33 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 27 Aug 2018 21:51:33 -0700 Subject: [gpfsug-discuss] Hiring at NERSC Message-ID: <3721D290-56CB-4D82-9C70-1AF4E2D82CB9@lbl.gov> Hi storage folks, We?re hiring here at NERSC. There are two openings on the storage team at the National Energy Research Scientific Computing Center (NERSC, Berkeley, CA). One for a storage systems administrator and the other for a storage systems developer. If you have questions about the job or the area, let me know. Check the job posting out here: http://m.rfer.us/LBLlpzxG http://m.rfer.us/LBLmOKxH Cheers, Kristy From r.sobey at imperial.ac.uk Tue Aug 28 11:09:23 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 28 Aug 2018 10:09:23 +0000 Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P In-Reply-To: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> References: <40D26CEA-B1B2-41BA-AF2B-06F91A1D7341@brown.edu> Message-ID: I?m coming late to the party on this so forgive me, but I found that even using QoS I could not even snapshot my filesets in a timely fashion, so my rebalancing could only run at weekends with snapshotting disabled. Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of David Johnson Sent: 20 August 2018 17:55 To: gpfsug main discussion list Subject: [gpfsug-discuss] Rebalancing with mmrestripefs -P I have one storage pool that was recently doubled, and another pool migrated there using mmapplypolicy. The new half is only 50% full, and the old half is 94% full. Disks in storage pool: cit_10tb (Maximum disk size allowed is 516 TB) d05_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.93G ( 0%) d04_george_23 50.49T 23 No Yes 25.91T ( 51%) 18.9G ( 0%) d03_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.12G ( 0%) d02_george_23 50.49T 23 No Yes 25.9T ( 51%) 19.03G ( 0%) d01_george_23 50.49T 23 No Yes 25.9T ( 51%) 18.92G ( 0%) d00_george_23 50.49T 23 No Yes 25.91T ( 51%) 19.05G ( 0%) d06_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.35G ( 0%) d07_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.2G ( 0%) d05_cit_33 50.49T 33 No Yes 3.084T ( 6%) 69.93G ( 0%) d04_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.11G ( 0%) d03_cit_33 50.49T 33 No Yes 3.084T ( 6%) 70.08G ( 0%) d02_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.3G ( 0%) d01_cit_33 50.49T 33 No Yes 3.085T ( 6%) 70.25G ( 0%) d00_cit_33 50.49T 33 No Yes 3.083T ( 6%) 70.28G ( 0%) ------------- -------------------- ------------------- (pool total) 706.9T 180.1T ( 25%) 675.5G ( 0%) Will the command "mmrestripfs /gpfs -b -P cit_10tb? move the data blocks from the _cit_ NSDs to the _george_ NSDs, so that they end up all around 75% full? Thanks, ? ddj Dave Johnson Brown University CCV/CIS -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenneth.waegeman at ugent.be Tue Aug 28 13:22:46 2018 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Tue, 28 Aug 2018 14:22:46 +0200 Subject: [gpfsug-discuss] system.log pool on client nodes for HAWC Message-ID: Hi all, I was looking into HAWC , using the 'distributed fast storage in client nodes' method ( https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_hawc_using.htm ) This is achieved by putting? a local device on the clients in the system.log pool. Reading another article (https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_syslogpool.htm ) this would now be used for ALL File system recovery logs. Does this mean that if you have a (small) subset of clients with fast local devices added in the system.log pool, all other clients will use these too instead of the central system pool? Thank you! Kenneth From dod2014 at med.cornell.edu Wed Aug 29 03:51:08 2018 From: dod2014 at med.cornell.edu (Douglas Duckworth) Date: Tue, 28 Aug 2018 22:51:08 -0400 Subject: [gpfsug-discuss] More Drives For DDN 12KX Message-ID: Hi We have a 12KX which will be under support until 2020. Users are currently happy with throughput but we need greater capacity as approaching 80%. The enclosures are only half full. Does DDN require adding disks through them or can we get more 6TB SAS through someone else? We would want support contract for the new disks. If possible I think this would be a good stopgap solution until 2020 when we can buy a new faster cluster. Thank you for your feedback. -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Wed Aug 29 04:55:55 2018 From: skylar2 at uw.edu (Skylar Thompson) Date: Tue, 28 Aug 2018 20:55:55 -0700 Subject: [gpfsug-discuss] More Drives For DDN 12KX In-Reply-To: References: Message-ID: <20180829035555.GA32405@almaren> I would ask DDN this, but my guess is that even if the drives work, you would run into support headaches proving that whatever problem you're running into isn't the result of 3rd-party drives. Even with supported drives, we've run into drive firmware issues with almost all of our storage systems (not just DDN, but Isilon, Hitachi, EMC, etc.); for supported drives, it's a hassle to prove and then get updated, but it would be even worse without support on your side. On Tue, Aug 28, 2018 at 10:51:08PM -0400, Douglas Duckworth wrote: > Hi > > We have a 12KX which will be under support until 2020. Users are currently > happy with throughput but we need greater capacity as approaching 80%. > The enclosures are only half full. > > Does DDN require adding disks through them or can we get more 6TB SAS > through someone else? We would want support contract for the new disks. > If possible I think this would be a good stopgap solution until 2020 when > we can buy a new faster cluster. > > Thank you for your feedback. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From robert at strubi.ox.ac.uk Wed Aug 29 09:31:10 2018 From: robert at strubi.ox.ac.uk (Robert Esnouf) Date: Wed, 29 Aug 2018 09:31:10 +0100 Subject: [gpfsug-discuss] More Drives For DDN 12KX In-Reply-To: References: Message-ID: Realistically I can't see why you'd want to risk invalidating the support contracts that you have in place. You'll also take on worrying about firmware etc etc that is normally taken care of! You will need the caddies as well. We've just done this exercise SFA12KXE and 6TB SAS drives and as well as doubling space we got significantly more performance (after mmrestripe, unless your network is the bottleneck). We left 10 free slots for a potential SSD upgrade (in case of a large increase in inodes or small files). Regards, Robert -- Dr Robert Esnouf University Research Lecturer, Director of Research Computing BDI, Head of Research Computing Core WHG, NDM Research Computing Strategy Officer Main office: Room 10/028, Wellcome Centre for Human Genetics, Old Road Campus, Roosevelt Drive, Oxford OX3 7BN, UK Emails: robert at strubi.ox.ac.uk / robert at well.ox.ac.uk / robert.esnouf at bdi.ox.ac.uk Tel: (+44)-1865-287783 (WHG); (+44)-1865-743689 (BDI) ? -----Original Message----- From: "Douglas Duckworth" To: gpfsug-discuss at spectrumscale.org Date: 29/08/18 04:49 Subject: [gpfsug-discuss] More Drives For DDN 12KX Hi We have a 12KX which will be under support until 2020. Users are currently happy with throughput but we need greater capacity as approaching 80%. The enclosures are only half full. Does DDN require adding disks through them or can we get more 6TB SAS through someone else? We would want support contract for the new disks. If possible I think this would be a good stopgap solution until 2020 when we can buy a new faster cluster. Thank you for your feedback. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Thu Aug 30 23:34:07 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 30 Aug 2018 23:34:07 +0100 Subject: [gpfsug-discuss] fast ACL alter solution In-Reply-To: <55CA6182.9010507@buzzard.me.uk> References: <201508111811.t7BIBYt0004336@d03av04.boulder.ibm.com> <55CA6182.9010507@buzzard.me.uk> Message-ID: On 11/08/15 21:56, Jonathan Buzzard wrote: [SNIP] > > As I said previously what is needed is an "mm" version of the FreeBSD > setfacl command > > http://www.freebsd.org/cgi/man.cgi?format=html&query=setfacl(1) > > That has the -R/--recursive option of the Linux setfacl command which > uses the fast inode scanning GPFS API. > > You want to be able to type something like > > ?mmsetfacl -mR g:www:rpaRc::allow foo > > What you don't want to be doing is calling the abomination of a command > that is mmputacl. Frankly whoever is responsible for that command needs > taking out the back and given a good kicking. A further three years down the line and setting NFSv4 ACL's on the Linux command line is still as painful as it was back in 2011. So I again have a requirement to set NFSv4 ACL's server side :-( Futher, unfortunately somewhere in the last six years I lost my C code to do this :-( In the process of redoing it I have been looking at the source code for the Linux NFSv4 ACL tools. I think that with minimal modification they can be ported to GPFS. So far I have hacked up nfs4_getfacl to work, and it should not be too much extra effort to hack up nfs_setfacl as well. However I have a some questions. Firstly what's the purpose of a special flag to indicate that it is smbd setting the ACL? Does this tie in with the undocumented "mmchfs -k samba" feature? Second there is a whole bunch of stuff about v4.1 ACL's. How does one trigger that. All I seem to be able to do is get POSIX and v4 ACL's. Do you get v4.1 ACL's if you set the file system to "Samba" ACL's? Note in the longer term it I think it would be better to modify FreeBSD's setfacl/getfacl (say renamed to mmsetfacl and mmgetfacl) to do the job, on the basis that they handle both POSIX and NFSv4 ACL's in a single command. Perhaps a RFE? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From vtarasov at us.ibm.com Fri Aug 31 18:49:01 2018 From: vtarasov at us.ibm.com (Vasily Tarasov) Date: Fri, 31 Aug 2018 17:49:01 +0000 Subject: [gpfsug-discuss] system.log pool on client nodes for HAWC In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Aug 31 19:25:34 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 31 Aug 2018 18:25:34 +0000 Subject: [gpfsug-discuss] system.log pool on client nodes for HAWC In-Reply-To: References: , Message-ID: I'm going to add a note of caution about HAWC as well... Firstly this was based on when it was first released,so things might have changed... HAWC replication uses the same failure group policy for placing replicas, therefore you need to use different failure groups for different client nodes. But do this carefully thinking about your failure domains. For example, we initially set each node in a cluster with its own failure group, might seem like a good idea until you shut the rack down (or even just a few select nodes might do it). You then lose your whole storage cluster by accident. (Or maybe you have hpc nodes and no UPS protection, if they have hawk and there is no protected replica, you lose the fs). Maybe this is obvious to everyone, but it bit us in various ways in our early testing. So if you plan to implement it, do test how your storage reacts when a client node fails. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of vtarasov at us.ibm.com [vtarasov at us.ibm.com] Sent: 31 August 2018 18:49 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] system.log pool on client nodes for HAWC That is correct. The blocks of each recovery log are striped across the devices in the system.log pool (if it is defined). As a result, even when all clients have a local device in the system.log pool, many writes to the recovery log will go to remote devices. For a client that lacks a local device in the system.log pool, log writes will always be remote. Notice, that typically in such a setup you would enable log replication for HA. Otherwise, if a single client fails (and its recover log is lost) the whole cluster fails as there is no log to recover FS to consistent state. Therefore, at least one remote write is essential. HTH, -- Vasily Tarasov, Research Staff Member, Storage Systems Research, IBM Research - Almaden ----- Original message ----- From: Kenneth Waegeman Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [gpfsug-discuss] system.log pool on client nodes for HAWC Date: Tue, Aug 28, 2018 5:31 AM Hi all, I was looking into HAWC , using the 'distributed fast storage in client nodes' method ( https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_hawc_using.htm ) This is achieved by putting a local device on the clients in the system.log pool. Reading another article (https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_syslogpool.htm ) this would now be used for ALL File system recovery logs. Does this mean that if you have a (small) subset of clients with fast local devices added in the system.log pool, all other clients will use these too instead of the central system pool? Thank you! Kenneth _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss